Beyond intelligent routing, orchestrating multi-node AI deployments requires bulletproof underlying primitives, which is why Google leads the development of the Kubernetes LeaderWorkerSet (LWS) API. LWS enables llm-d to orchestrate wide expert parallelism and disaggregate compute-heavy prefill and memory-heavy decode phases into independently scalable pods. With its widespread industry adoption, LWS now orchestrates a rapidly growing footprint of production AI workloads, managing massive fleets of TPUs and GPUs at global scale. Complementing this orchestration, Google recently extended vLLM natively for Cloud TPUs. Featuring a unified PyTorch and JAX backend alongside innovations like Ragged Paged Attention v3, this integration delivers up to 5x throughput gains over our first release earlier last year. Together, whether you are scaling on Google Cloud TPUs or NVIDIA GPUs, these advancements help ensure state-of-the-art AI serving remains a highly optimized, accelerator-agnostic capability.
Building next-gen AI infrastructure together
To build the ultimate AI infrastructure, we must bridge the gap between cloud-native Kubernetes orchestration and frontier AI research. The shift to production-grade gen AI requires an engine built on trust, transparency, and deep collaboration with the AI/ML leaders pushing the boundaries of what is possible.
We are incredibly excited to partner with the Linux Foundation, the CNCF, the PyTorch Foundation, and the rest of the open-source community to build the next generation of AI infrastructure. By establishing “well-lit paths” — proven, replicable blueprints tested end-to-end under realistic load — we are ensuring that high-performance AI thrives as an open, universally accessible ecosystem that empowers innovation without boundaries.
We invite large foundation model builders, AI natives, platform engineers, and AI researchers to join us in shaping the open future of AI inference:
-
Explore the well-lit paths: Visit the llm-d guides to start deploying SOTA inference stacks on your infrastructure today.
-
Learn more: Check out the official website at https://llm-d.ai/
-
Contribute: Join the community on Slack and get involved in our GitHub repositories at https://github.com/llm-d/.
Join us in celebrating llm-d at the CNCF! We look forward to scaling the engine together.






