Anthropic’s agentic coding tool Claude Code has worked with Google Cloud for a while now. An individual developer could easily point CLAUDE_CODE_USE_VERTEX=1 at a Google Cloud (GCP) project, grant the role roles/aiplatform.user, and inference stays inside your Google Cloud perimeter.
That flow works great when it’s just you, or a handful of engineers. But rolling it out across an organization forces you to deal with enterprise friction: you have to manage per-developer cloud credentials, push a managed-settings.json to every laptop over MDM, and not be verified with zero per-developer usage attribution or easily enforceable spend caps.
The Claude apps gateway closes that gap. It is a self-hosted service, shipped with the same claude binary, that sits directly between your local Claude Code clients and Google Cloud. This post breaks down exactly why you should run it and what a secure deployment looks like on Google Cloud.
(Note: If you want to jump straight to the code, the full walkthrough lives in the Claude apps gateway on Google Cloud docs.)
Why run the gateway
Run the gateway to centralize the governance that developers and platform admins otherwise each carry alone such as identity, policy, cost, and routing. Here’s what that looks like in practice.
Identity. The /login request routes through your identity provider (IdP ) – Google Workspace or any OIDC/OpenID Connect one – and the gateway swaps the token for a short-lived session. No sensitive information lands on the developer’s laptop — such as service-account keys, API keys, or ANTHROPIC_VERTEX_PROJECT_ID. Onboarding is as simple as adding a user to an IdP group; offboarding by removing them, and their next session refresh fails on the spot.
Policy. Your RBAC (role-based access control) rules live once in gateway.yaml, resolved per group and enforced server-side. The gateway re-checks availableModels on every /v1/messages call, so editing local managed-settings.json changes nothing — and rule updates reach the whole fleet within the hour.
Telemetry. Every claude_code.token.usage metric carries the verified email and groups from the session JWT (signed session token), not the spoofable client-set OTEL_RESOURCE_ATTRIBUTES. The gateway ships them over OTLP/HTTP to a collector you run — Cloud Monitoring, Grafana, Datadog, whatever you use.
Spend limits. Set daily, weekly, or monthly caps per user, group, or org via the admin API; the gateway meters tokens against a Cloud SQL ledger and returns a 429 at the cap. Costs are at list price, so treat them as a runaway-usage guardrail, not a bill reconciliation (committed-use discounts and negotiated rates don’t show up).
Routing. Calls go out under a single Cloud Run service identity. Set region: global for Agent Platform’s global endpoint, or add a second upstreams: entry to fail over on 5xx/429/timeout in list order. Either way, inference stays in your GCP project — quota, Data Processing Agreement, and billing all unchanged.
How it fits together
A developer’s local or deployed claude process sends inference traffic to the gateway over HTTPS. The gateway is a stateless container on Cloud Run as shown below.






