Thursday, April 23, 2026
  • Login
  • Register
Technology Tutorials & Latest News | ByteBlock
  • Home
  • Tech News
  • Tech Tutorials
    • Networking
    • Computers
    • Mobile Devices & Tablets
    • Apps & Software
    • Cloud & Servers
    • IT Careers
    • AI
  • Reviews
  • Shop
    • Electronics & Gadgets
    • Apps & Software
    • Online Courses
    • Lifetime Subscription
No Result
View All Result
Tech Insight: Tutorials, Reviews & Latest News
No Result
View All Result
Home News Google

AI infrastructure at Next ‘26

April 23, 2026
in Google
0 0
0

Fueling agentic logic and reinforcement learning with Axion, Intel, and AMD

While GPUs and TPUs are great for training and serving AI models, they need to be complemented with high-performance CPU-based services to handle the complex logic, tool-calls, and feedback loops that surround the core AI model. Our new Axion-powered N4A CPU instances deliver outstanding price-performance for these agent runtimes. In fact, GKE Agent Sandbox with Google Axion N4A offers up to 30% better price-performance than agent workloads on other hyperscalers. This efficiency extends across our entire portfolio, including our 4th generation Compute Engine VM families, powered by the latest x86 instances from Intel and AMD. These are specifically optimized for the broadest range of RL tasks, such as RL reward calculation, agent orchestration, and nested visualization, providing the optimal capabilities for every AI workload. 

Virgo Network for data center scale-out fabric

As part of AI Hypercomputer, the Virgo Network is designed to meet the demanding requirements of modern large-scale AI workloads. Its collapsed fabric architecture with 4x the bandwidth of previous generations eliminates the “scaling tax” to deliver staggering peak computing power. This capacity helps the most ambitious AI workloads scale with near-linear efficiency.

With Virgo Network and TPU 8t, we can connect 134,000 TPUs into a single fabric in a single data center, and connect more than one million TPUs across multiple data center sites into a training cluster — essentially transforming globally distributed infrastructure into one seamless supercomputer. 

We are also making Virgo Network available for A5X (powered by NVIDIA Vera Rubin NVL72), supporting up to 80,000 GPUs in a single data center, and up to 960,000 GPUs across multiple sites.

Storage: Minimizing data bottlenecks

A massive compute cluster is only as effective as the storage system feeding it data. To ensure storage is not a bottleneck while making compute faster, we are delivering four key storage advancements that let you: 

  • Accelerate training and inference: Google Cloud Managed Lustre now delivers 10 TB/s of bandwidth — a 10x improvement over last year and up to 20x faster than other hyperscalers. We’ve also increased its capacity to 80 petabytes. These advancements are powered by our new C4NX instances and Hyperdisk Exapools. 

  • Minimize latency: Managed Lustre can leverage new TPUDirect and RDMA to allow data to bypass the host, moving directly to the accelerators. By removing this processing overhead, your AI agents can respond with the near-instant speed users need. 

  • Maintain peak utilization for training: Rapid Buckets on Google Cloud Storage transforms object storage with sub-millisecond latency and 20 million operations per second. This helps ensure large-scale training checkpoints and recoveries happen near-instantly, allowing your accelerators to maintain 95% utilization or higher, accelerating training cycles, while also providing cost-effective utilization of valuable TPUs and GPUs.

  • Build custom solutions: For ISVs and organizations that want to build storage solutions, we are launching the Z4M instance, specifically engineered for customers who want to integrate trusted parallel file systems like Vast Data or Sycomp. Each Z4M instance scales to a massive 168 TiB of local SSD capacity and can be deployed in RDMA clusters of thousands of machines. 

These new storage options provide a comprehensive storage portfolio, giving you the raw power of the AI Hypercomputer stack with optimal storage services for each use-case.

GKE: Orchestration for agent-native workloads

In the agentic era, intelligence is only as effective as the speed at which it can be scaled. So, we’ve transformed GKE to serve as the premier orchestration engine for agent-native workloads.

Reducing latency across the stack
To support responsive agentic responses, we optimize every millisecond of the start-up and scale-out process. By streamlining how infrastructure responds to surges in demand, GKE ensures that your agents are ready the moment a user engages with the system. New in GKE are:

  • Accelerated node and pod startup: GKE nodes now start up to 4x faster, while pod startup times have been slashed by up to 80%.

  • Rapid model loading: Leveraging the run:AI Model Streamer and Rapid Cache in Google Cloud Storage, models now load 5x faster, removing a traditional storage bottleneck.

Intelligent routing with AI-powered Inference Gateway
Building on last year’s introduction of GKE Inference Gateway, we are using “AI for AI” to solve the complexities of serving at scale. 

Inference Gateway’s new predictive latency boost replaces heuristic guesswork with machine learning-driven, real-time capacity-aware routing. This intelligent orchestration cuts time-to-first-token (TTFT) latency by more than 70% without manual tuning. For businesses, this translates directly into more natural voice conversations and smooth, real-time interactions across a range of use cases. 

Inference Gateway can be deployed alongside llm-d, a Kubernetes-native high-performance distributed LLM inference framework, which was recently accepted as a Cloud Native Computing Foundation (CNCF) Sandbox project. Google Cloud is proud to be a founding contributor to llm-d alongside Red Hat, IBM Research, CoreWeave, and NVIDIA, uniting around a clear, industry-defining vision: any model, any accelerator, any cloud.

ShareTweetShare
Previous Post

What’s new in GKE at Next 26

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You might also like

AI infrastructure at Next ‘26

April 23, 2026

What’s new in GKE at Next 26

April 23, 2026

What’s new for Cloud Run at Next ‘26

April 23, 2026

Startups are building the agentic future with Google Cloud

April 23, 2026

Looker updates for agentic BI at Next ‘26

April 23, 2026

Unveiling new BigQuery capabilities for the agentic era

April 23, 2026
monotone logo block byte

Stay ahead in the tech world with Tech Insight. Explore in-depth tutorials, unbiased reviews, and the latest news on gadgets, software, and innovations. Join our community of tech enthusiasts today!

Stay Connected

  • Home
  • Tech News
  • Tech Tutorials
  • Reviews
  • Shop
  • About Us
  • Privacy Policy
  • Terms & Conditions

© 2024 Byte Block - Tech Insight: Tutorials, Reviews & Latest News. Made By Huwa.

Welcome Back!

Sign In with Google
Sign In with Linked In
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Google
Sign Up with Linked In
OR

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
  • Login
  • Sign Up
  • Cart
No Result
View All Result
  • Home
  • Tech News
  • Tech Tutorials
    • Networking
    • Computers
    • Mobile Devices & Tablets
    • Apps & Software
    • Cloud & Servers
    • IT Careers
    • AI
  • Reviews
  • Shop
    • Electronics & Gadgets
    • Apps & Software
    • Online Courses
    • Lifetime Subscription

© 2024 Byte Block - Tech Insight: Tutorials, Reviews & Latest News. Made By Huwa.

Login