Cloud Run worker pools at Estee Lauder Companies

Cloud Run has long provided developers with a straightforward, opinionated platform for running code. You can easily deploy request-driven web applications using Cloud Run services, or execute run-to-completion batch processing with Cloud Run jobs. However, as developers build more complex applications, like pipelines that process continuous streams of data or distributed AI workloads, they need an environment designed for continuous, background execution.

Estée Lauder Companies got just that with Cloud Run worker pools, which transform Cloud Run from a platform for web workloads and background tasks, to a platform for pull-based workloads. Cloud Run worker pools are now generally available.

Estee Lauder Companies’ Rostrum platform is a polymorphic chat service for LLM-powered applications that originally ran as a standalone Cloud Run service. While the simple architecture worked for internal tools with predictable traffic, the team faced a major hurdle of the upcoming holiday shopping season for consumer-facing traffic. To launch their first consumer-facing generative AI application, Jo Malone London’s AI Scent Advisor, they needed an architecture that would sustain the load of AI prompts from thousands of simultaneous users.

In just a few weeks, Estee Lauder Companies migrated to a producer-consumer model using Cloud Run worker pools. The web tier, a FastAPI application deployed as Cloud Run Service acts as the producer, instantly publishing user messages to Cloud Pub/Sub. The worker pools deployments act as “always-on” consumers, pulling messages from the queue to handle LLM inference.

By decoupling the user-facing web tier from LLM operations, Estee Lauder Companies achieved:

100% message durability: Pub/sub acts as a buffer such that even during holiday spikes, no user message is lost.
Strong UI latency SLAs: Server-side rendering is decoupled from message processing load.
Minimal operations overhead: The team spent virtually no time managing servers, allowing them to focus on the user experience rather than infrastructure.

This modular architecture now serves as the blueprint for Estee Lauder Companies to rapidly launch specialized AI advisors across its diverse house of brands.

“The Jo Malone London AI Scent Advisor chains multiple LLM and tool calls — conversational discovery, deterministic scoring, copy generation — in a pipeline that had to run reliably at consumer scale without us managing infrastructure. Cloud Run worker pools was exactly the right primitive, and working directly with the product team as early adopters gave us the confidence to build on it ahead of GA. It’s now the foundation for us to bring AI advisors to brands across the Estée Lauder Companies portfolio.” – Chris Curro, Principal Machine Learning Engineer, The Estée Lauder Companies

Cloud Run worker pools at Estee Lauder Companies

Securing AI inference on GKE with Model Armor