Google Cloud AI infrastructure at NVIDIA GTC 2026

The era of agentic AI is fundamentally changing enterprise infrastructure needs. As organizations build systems capable of dynamic reasoning and autonomous execution, the underlying infrastructure must evolve as well. Scaling these agentic workloads alongside massive mixture-of-experts (MoE) architectures demands a deeply optimized co-engineered stack.

To meet these demands, we’ve built the Google Cloud AI Hypercomputer, an AI-optimized infrastructure as a service, that integrates performance-optimized hardware, leading software, open frameworks, and flexible consumption models into a single, cohesive system to deliver ultra-low latency, high-throughput, and cost-effective inference. To give our customers even more options within this integrated architecture, we are expanding our partnership with NVIDIA.

This week at NVIDIA GTC 2026, Google Cloud and NVIDIA are expanding our partnership with a wave of new announcements, showcasing a co-engineered AI infrastructure foundation:

Infrastructure and hardware

Strong momentum for Google Cloud G4 VMs, powered by NVIDIA RTX Pro 6000 Server Edition
Preview of flexible, fractional G4 VMs using NVIDIA vGPU technology — a first in the industry for NVIDIA RTX Pro 6000 Server Edition
Upcoming support for NVIDIA Vera Rubin NVL72 Platform

Software and platform

Ecosystem

Let’s take a closer look at the announcements.

Accelerating AI workloads with G4 VMs

G4 VMs, powered by NVIDIA RTX Pro 6000 Server Edition GPUs, are built to power a diverse spectrum of high-performance workloads — from advanced spatial computing to complete AI development lifecycles. For instance, companies like Otto Group One.O and WPP use the G4 to run physically accurate simulations and real-time 3D rendering at scale.

Beyond simulation, the G4 also shines in model fine-tuning and inference, particularly for models ranging from 30B to more than 100B parameters. By leveraging 4-bit floating point (FP4) precision and Google’s peer-to-peer (P2P) communication, customers are achieving higher throughput for model serving and considerable latency reductions, enabling a new class of real-time, multimodal AI agents and highly responsive generative AI applications.

Here are some examples of how customers are already leveraging the performance and efficiency of G4 VMs to accelerate their most demanding workloads:

“Google Cloud’s G4 VMs give us the scalable GPU backbone we need to push billions of miles of photorealistic simulation through our pipeline. The 4x lift in throughput means our ML teams can iterate faster, train on richer data, and validate edge cases long before our models ever see the real world.” – Sony Mohapatra, Director, AI/ML Engineering, General Motors

“Now with G4 VMs powered by NVIDIA Blackwell, we’re pushing our multimodal models even further — faster inference, better reliability, instant replies across languages. The goal stays the same: making voice agents that work at enterprise scale without compromise. We are excited to keep building together and see what our customers deploy with this.” – Mati Staniszewski, Cofounder, ElevenLabs

“Google Cloud G4 VMs provide the computational backbone for our Robotic Coordination Layer, allowing us to synchronize autonomous fleets across our logistics centers with millisecond precision. By simulating complex warehouse environments in a high-fidelity digital twin, we can optimize our entire supply chain virtually before a single robot moves on the floor.” – Dr. Stefan Borsutzky, CEO of Otto Group One.O

“After transitioning to G4 VMs, we achieved a 50% reduction in processing latency and 6x increase in throughput just by updating our Terraform scripts. It’s rare to get that kind of performance boost for our core workloads without adding any operational overhead.” – Alfonso Acosta, Head of Engineering, Imgix

Introducing fractional G4 VMs

We are excited to announce the preview of fractional G4 VMs, providing a highly efficient and cost-effective entry point for AI and graphics workloads. These new configurations, using NVIDIA virtual GPU (vGPU) technology, allow you to leverage the power of the NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs in flexible, smaller increments, so you can right-size your infrastructure to match the specific demands of your applications.

By providing more granular access to advanced hardware, fractional G4 VMs let you optimize resource allocation and reduce overhead without sacrificing performance. You can now select from additional GPU slice sizes for your specific needs:

1/2 GPU: Ideal for more intensive tasks such as LLM inference, robotics sensor simulation, and high-fidelity 3D rendering.
1/4 GPU: Optimized for mainstream workloads, including mid-range creative design, video transcoding, and real-time data visualization.
1/8 GPU: Great for lightweight applications such as remote desktops, productivity tools, and entry-level streaming services.

These flexible G4 size portfolio let you:

Right-size infrastructure: Precisely match GPU capacity to application demands, ranging from lightweight remote desktops to intensive data processing.
Maximize cost efficiency: Lower operational overhead by utilizing — and paying for — only the fractional GPU resources you need for specific tasks.
Scale diverse workloads: Power a broad spectrum of innovation, from high-fidelity creative design and streaming to complex robotics simulations and real-time inference.

These fractional G4 VMs can be managed by Google Kubernetes Engine (GKE), allowing developers to use advanced container binpacking to achieve even higher price-performance and resource utilization. When managed through Dynamic Workload Scheduler, you can set fallback priorities for fractional slices. This significantly improves obtainability by allowing the scheduler to automatically find available GPU configurations for each workload.

“The G4 vGPU’s flexible sizing allows us to precisely tailor compute resources to the scale of each molecular simulation, ensuring maximum efficiency across our drug discovery pipeline. This granular control means our researchers can seamlessly pivot between smaller workflows and massive parallel processing without being constrained by fixed hardware configurations.” – Shane Brauner, EVP, CIO, Schrödinger

Scaling AI Hypercomputer with NVIDIA Vera Rubin NVL72

Building on our deep engineering partnership with NVIDIA, we’re proud to support the successor to NVIDIA Blackwell architecture, the recently announced NVIDIA Vera Rubin platform. We plan to be among the first cloud providers to offer NVIDIA Vera Rubin NVL72 rack-scale systems in the second half of 2026, integrating them into our AI Hypercomputer architecture to empower the next generation of reasoning and agentic AI.

Delivering efficiency across the AI infrastructure stack

As part of our commitment to a fully open ecosystem, we are excited to announce the integration of Dynamo and GKE Inference Gateway. This integration provides a modular, open-source control plane across the application layer and the hardware. By combining Dynamo with Inference Gateway on GKE, teams can tailor their infrastructure to their exact needs, allowing them to extract the maximum ROI from accelerators, accelerate time-to-market for new AI models, and future-proof their deployments.

You can learn to maximize performance for massive MoE architectures through new advanced scaling recipes for A4X VMs (powered by NVIDIA GB200 NVL72 and Dynamo). These configurations show how to overcome memory and interconnect bottlenecks when running AI inference workloads on AI Hypercomputer.

We are also enhancing resource obtainability through the Dynamic Workload Scheduler, with Calendar Mode and Flex Start for A4X and A4X Max (powered by NVIDIA GB300 NVL72), as well as new Flex Start support for G4 VMs. Dynamic Workload Scheduler lets you reserve the precise capacity that you need, or use flexible start windows.

Snap, a long-time Google Cloud customer, achieved significant cost savings by migrating two of its primary data processing pipelines to Google Cloud G2 VMs powered by NVIDIA L4 Tensor Core GPUs. This was made possible by leveraging Spark on GKE alongside NVIDIA’s new cuDF libraries, which automated the optimization of its shuffle-heavy workloads for optimal GPU efficiency. Learn more at GTC session S81678.

Advancing Vertex AI training and Model Garden

We are meeting the demands of next-generation AI with two major infrastructure advancements to Vertex AI training clusters. First, support for A4X VM domains lets you leverage Vertex AI’s managed infrastructure and framework capabilities for massive-scale training on NVIDIA GB200 NVL72 rack-scale systems. To ensure these intensive workloads remain uninterrupted, new hardware resiliency capabilities let you apply configurable, proactive fault detection scans, which identify and mitigate potential hardware issues before they can disrupt critical “hero” training runs. These capabilities enable higher goodput and helps ensure that multi-week training jobs stay on track without costly restarts.

“We are setting a new standard for the agentic enterprise — delivering highly capable, consistent, accurate, and responsive AI agents with Google and NVIDIA. By leveraging Vertex AI training clusters on NVIDIA GB200 NVL72 to power our Agentforce 360 Platform, we’ve eliminated infrastructure bottlenecks to keep our GPUs fully saturated. This high-performance, resilient architecture allows our researchers to focus on innovation at scale, driving substantial gains for our most complex reasoning workloads.” – Silvio Savarese, Chief Scientist, Salesforce

At the same time, we continue to broaden Vertex AI Model Garden with support for NVIDIA’s Nemotron 3 family of open models. These include the Nemotron 3 Nano, featuring one-click deployment to simplify integration into private VPCs. We’ve also expanded our catalog to include the NVIDIA Nemotron 3 Super 120B model for immediate access to high-performance, large-scale reasoning. To maximize the value of these models, we’ve integrated NVIDIA’s latest performance libraries directly into Vertex AI to optimize popular open-source models on NVIDIA TensorRT-LLM.

Empowering public sector AI startups

To foster continued innovation within the ecosystem, Google Public Sector and NVIDIA are launching an AI startup accelerator program. This year-long initiative will support a select cohort of AI-focused Independent Software Vendors (ISVs) building solutions for the public sector.

Participants gain dual access to both NVIDIA Inception and Google Cloud’s ISV accelerator resources. Kicking off at GTC and continuing through Google Cloud Next, this joint program will equip emerging technology leaders with the co-engineered infrastructure, technical guidance, and go-to-market support required to scale mission-critical public sector applications. To learn more about the program, please complete the interest form. Additional cohorts will be selected and announced in the future.

Co-engineering collaboration powers every layer of the AI stack

The transition to complex, agentic AI demands more than just raw compute. It requires a fully optimized, co-engineered stack. By integrating flexible hardware like fractional G4 instances and the upcoming Vera Rubin platform into our AI Hypercomputer architecture, and pairing it with deep software co-engineering, we provide the scale, resilience, and efficiency you need to turn your most ambitious AI visions into reality.

Coming to GTC? Stop by booth #513 to learn more and talk to our team. And you can always learn more about our collaboration with NVIDIA at cloud.google.com/NVIDIA.

Google Cloud AI infrastructure at NVIDIA GTC 2026

Best WiFi Router For A Large Home | 2024

Leave a Reply Cancel reply

You might also like

Google Cloud AI infrastructure at NVIDIA GTC 2026

Best WiFi Router For A Large Home | 2024

How to Set Up a Wireless Router as an Access Point

LG MyView Smart Monitor Review

Stay Connected

Welcome Back!

Create New Account!

Retrieve your password