Architecting AI infrastructure for U.S. Winter Olympians

Solving occlusion is only half the battle. Delivering these insights quickly—seconds after a U.S. Olympian lands —requires heavy-duty infrastructure. We built a high-performance inference engine on Google Cloud to handle the intense MLOps demands of the competition.

The hardware foundation: TPUs

At the core of the pipeline are Google’s Tensor Processing Units (TPUs), tasked with the heaviest matrix math. An encoder first compresses the video into a latent representation, and a video transformer model predicts the 3D joint positions.

To eliminate the standard cloud “cold start” delay, we statically provisioned dedicated TPU slices for the duration of Team USA’s competition at the Olympic Winter Games. This kept the models perpetually loaded in High-Bandwidth Memory (HBM). When a video arrives, it hits a “warm” TPU, guaranteeing near-instantaneous, predictable inference without the resource contention of a multi-tenant environment.

Orchestration at scale: Vertex AI

Deploying to a single lab server is easy; orchestrating live action at the Olympic Games is not. Vertex AI provided the unified control plane to manage volume, complexity, and latency:

Horizontal scaling with batch prediction: Using the Vertex AI Batch Prediction API, incoming video is instantly directed to a distributed network of workers. This decouples model loading from inference, allowing the system to scale horizontally and process multiple athletes simultaneously without choking.
Volume and elasticity: Video analysis of U.S. Olympians is what we describe as ‘bursty’ – computational needs spike for the short duration of the athlete runs. . Vertex AI dynamically provisions resources to absorb these data spikes, rather than keeping resources always-on.
Security and exclusivity: To protect proprietary Team USA data, we established a Private Endpoint within a Virtual Private Cloud (VPC). Authorized traffic travels via dedicated network pathways, isolating the engine from the public internet to reduce the attack surface and minimize latency.

Beyond the snow

A system capable of reliable pose estimation under extreme winter conditions—high speeds, constant occlusion, and a requirement for speed—is a system that generalizes. We believe the underlying AI architecture, and the ability to provide generalized intelligence from structured data feeds can enable a number of use cases beyond winter athletics.

Imagine a conversational AI physical therapy coach that analyzes and helps with movement form. Or, robot assistance for a factory worker that is triggered by cues noticed in their posture. These are all potential use cases where specialized sensor AI, paired with powerful reasoning models, can provide helpful insights and actions.

Architecting AI infrastructure for U.S. Winter Olympians

Run evals for Conversational Analytics agents using Prism