The outcome:
Ramble has come to rely on the quality of Google’s AI models, particularly the reasoning and near-instant audio-processing capabilities of Gemini Flash. Other platforms and models offer similar capabilities, and we did bake in support for them, but none hit our internal quality bar as consistently as Gemini. When it came to a user’s unstructured “rambling” and the need to fill in gaps, Gemini turned out to be the most intelligent of all the models we explored. The result was the clearest and most consistent breakdown of tasks, which was the exact magical user experience we wanted to create.
After an early rate-limit incident caused by unexpectedly high usage during alpha testing, we developed a deeper, more proactive partnership with Google, ensuring long-term sustainability and the support necessary for our high API usage. Since then, it’s been easy for us to connect directly with Google Cloud staff, including engineers, when issues arise.
Here at Doist, Ramble took off both in a qualitative and quantitative sense. It’s become a hallmark experience that incentivizes us to explore tasteful applications of AI that can enhance our existing product experience, both in the B2C space as well as B2B. Beyond task creation, we’re considering several opportunities across the productivity journey, from capture to planning and even automation.
The details:
We structured our back-end to enable future voice-powered features. The architecture includes a provider-agnostic streaming layer; a dictation module for one-way audio; Ramble (our “brain dump” module); and a conversation module to support streaming bi-directional audio and future conversational features.
This layered design means we can ship new voice features with minimal additional infrastructure work. It also enables provider flexibility; although we’re using Gemini Enterprise Agent Platform in production, our abstraction layer also easily supports other solutions.
In addition to helping us tackle three of our four key technical challenges, Agent Platform delivered some nice surprises. First, session resumption was easier than we expected. We initially thought maintaining conversation state across reconnections would require complex server-side session management. But once we understood Agent Platform’s resumption token approach (the token is provided by the API and changes with each context update), implementation was straightforward across all platforms.
Second, context injection worked on the first try. We spent considerable time designing how to provide user context (projects, labels, preferences) to the model. We explored complex retrieval strategies and dynamic context windows. In the end, the simple “v1” approach—just passing most of the user’s metadata in the system prompt—worked remarkably well.
For testing, we combined structural validation (task count, priority levels, date presence, etc) with semantic validation (did the model understand the user’s intent?) following the LLM-as-judge approach. A second Gemini model evaluates whether the output semantically matches the expected outcome. Native speakers from our global team recorded real-world scenarios in their languages and local accents (15+ language variations and over 100 recordings total), with each scenario having expected semantic outcomes (e.g., “should create 3 tasks: one about calling family, one about shopping, one about exercise on Saturday at 11 AM”).
We then created a defined pass-rate threshold for the test suite overall, while also monitoring per-language performance to catch regressions. This approach lets us evaluate new model versions systematically, understanding not just overall performance but also which specific languages might see improved or degraded experiences, and make data-informed decisions.
Ultimately, Ramble is a resounding success in helping our users handle the chaos of day-to-day life. It joins the ranks of Todoist’s Quick Add — our existing natural-language task input — in providing yet another way to capture tasks that is the best in its category.






