GKE and OSS innovation at KubeCon EU 2026

As the cloud-native community gathers in Amsterdam for Kubecon + Cloudnativecon Europe this week, we’re excited to highlight some of the work we are doing to support both the open-source Kubernetes ecosystem and Google Kubernetes Engine (GKE). From breaking down the walls between cluster operating modes to making Kubernetes the absolute best place to run AI agents and Ray, here’s a look at what we are rolling out.

Autopilot for everyone

Five years ago, we introduced GKE Autopilot, a fully managed GKE experience that dramatically simplified scaling and infrastructure management. Previously, choosing between GKE Autopilot mode and Standard mode was a “fork in the road” decision made at cluster creation time. If you started with Standard and later wanted to switch to Autopilot, you had to create an entirely new cluster. This created friction for organizations managing mixed clusters, where some workloads required strict node-level control while others needed seamless, hands-off scaling.

Meet the new GKE, where Autopilot is available for every cluster. Autopilot compute classes are now available for Standard clusters, allowing you to turn on Autopilot at any time, on a per-workload basis. Powered by GKE Autopilot’s Container-Optimized Compute Platform (COCP), you can unlock near-real-time, vertically and horizontally scalable compute that provides the exact capacity that you need, when you need it, at the best price and performance.

Furthermore, we are happy to announce we will open source GKE Cluster Autoscaler, one of the core components driving infrastructure provisioning for our customers. Our goal is to provide a vendor-neutral platform that the OSS community can benefit from and build on top of.

Toward CNCF Kubernetes AI Conformance

As the industry moves toward AI at massive scale, standardization is paramount. Together with the Kubernetes community last year, we launched the CNCF Kubernetes AI Conformance program, which simplifies AI/ML on Kubernetes by establishing a standard for cluster interoperability and portability. We are proud to announce that GKE is certified as an AI-conformant platform, so that your models and AI tools can be ported across environments.

Looking ahead to the upcoming v1.36 Kubernetes release, the AI Conformance community is proposing three new requirements to address the evolving needs of AI serving: advanced inference ingress, disaggregated serving, and high-performance networking. Google Cloud is committed to supporting these emerging community standards through GKE Inference Gateway, llm-d, and DRANET.

Model Context Protocol: An agent interface

To streamline how AI agents interact with Kubernetes, last year, we introduced the open-source GKE Model Context Protocol (MCP) Server, which offers a standardized interface that allows agents to manage, analyze, and monitor workloads, clusters, and resources through specific defined capabilities. By exposing these capabilities, MCP Server makes it easier to integrate various AI clients, including Gemini CLI and Antigravity, promoting more intelligent and automated management of Kubernetes ecosystems.

Kubernetes as AI infrastructure

llm-d is officially a CNCF Sandbox project, which marks a significant step in evolving Kubernetes into state-of-the-art AI infrastructure. Launched in May 2025 as a collaborative effort with industry leaders like Red Hat and NVIDIA, llm-d provides a Kubernetes-native distributed inference framework designed to be hardware-agnostic and vendor-neutral.

The project addresses complex AI orchestration challenges by introducing well-lit paths for inference-aware traffic management, native orchestration for multi-node replicas, and advanced state management for hierarchical KV cache offloading. By bridging the gap between cloud-native orchestration and frontier AI research, llm-d democratizes high-performance AI serving and establishes open, reproducible benchmarks for inference performance across various accelerators. We plan to work with the CNCF AI Conformance program on llm-d to help ensure critical capabilities like disaggregated serving are interoperable across the ecosystem. For more on llm-d, check out our blog here.

DRA is the new standard for resource management

Kubernetes was created in a simpler time, when CPU and Memory were the only variables, and clouds were seen as infinitely elastic. Today, of course, hardware is specialized and variable. Dynamic Resource Allocation, or DRA, is an industry-standard solution for describing unique hardware in a standard format, allowing higher-level workloads and schedulers to optimize resources without access to low-level details about them. Today, we’re proud to announce the open-source release of our DRA driver for TPUs, marking a significant milestone in bringing AI workload portability to the Kubernetes ecosystem. Google and NVIDIA partnered closely on the design and implementation of DRA in OSS Kubernetes in a collaborative push to establish a unified resource management standard. We are proud to coordinate this release with the donation of the NVIDIA DRA Driver. This is in addition to our DRA driver for networking, DRANET, which is already available as a managed feature of GKE.

Supporting the agentic wave: Inference and agents

The agentic AI wave is upon us, and we believe Kubernetes is unequivocally the best platform on which to run these agents. To execute LLM-generated code and interact with AI agents with confidence, you need deep isolation, rapid startup times, and specialized infrastructure.

We are heavily investing in open-source inference work to make this a reality. By leveraging innovations like Kubernetes Agent Sandbox for secure, gVisor-backed isolation, and GKE Pod Snapshots, which drastically improve startup latency by restoring workloads from a memory snapshot, we are establishing a standard for agentic AI on Kubernetes and providing high performance and compute efficiency for agents running on GKE.

Ray on Kubernetes: TPUs and better observability

Ray has become the standard for scaling demanding AI workloads, and we believe Kubernetes is a great place to run it. Until recently, official accelerator support was limited to NVIDIA GPUs. We are excited to announce TPUs in Ray v2.55, with full support by Anyscale and Google.

Ray on K8s users have historically struggled to debug and optimize performance, because they didn’t have access to historical data about their jobs.To solve this, we are introducing the ability to debug issues after the RayJob has completed or terminated. The Ray History Server uses Kuberay to set up and persist logs, state and metrics from live RayJobs and reproduce them in the Ray Dashboard. The Ray History Server (alpha) is available to try today.

Join us at the booth

Whether you are scaling up next-gen AI inference, deploying highly isolated agentic workflows, or simply looking to optimize compute capacity across your clusters, we are committed to making Kubernetes and GKE the ultimate platform for your success.

If you’re at KubeCon Europe, stop by the Google Cloud booth (#310) to dive deep into these announcements and to discover our sessions, lightning talks, hands on labs, and demos — plus a friendly competition with our text-based adventure game. Here’s to the future of Kubernetes!

GKE and OSS innovation at KubeCon EU 2026

llm-d officially a CNCF Sandbox project

Leave a Reply Cancel reply

You might also like

GKE and OSS innovation at KubeCon EU 2026

llm-d officially a CNCF Sandbox project

5 Ways our Enterprise Browser Keeps Reinforcing Security

M-Trends 2026: Data, Insights, and Strategies From the Frontlines

RSAC ’26: Supercharging agentic AI defense with frontline threat intelligence

Bringing dark web intelligence into the AI era

Stay Connected

Welcome Back!

Create New Account!

Retrieve your password