Please note that the tags are in English only, but English-language tags can be combined with text in other languages.
French language example:
[cautious] L’ombre avança lentement dans la pièce silencieuse. [whispers] Le document secret devait être caché ici. [short pause] Mais où? [gasp] Soudain, un bruit sourd résonna dans le couloir [panic] Il fallait sortir d’ici immédiatement.
4. Directing expression and pacing
Gemini 3.1 flash TTS supports 200+ audio tags to prompt expressive voices.
Most commonly used tags include: [determination], [enthusiasm], [adoration], [interest], [awe], [admiration], [nervousness], [frustration], [excitement], [curiosity], [hope], [annoyance], [amusement], [aggression], [tension], [agitation], [confusion], [anger], [positive], [neutral], [negative], [whispers], and [laughs].
Pacing and stylistic controls: you can use pacing tags like [slow] or [fast] to control the speed of the delivery. To pace out your information and let dramatic moments land, use tags like [short pause] or [long pause].
Non-verbal vocalizations: the model can produce realistic non-verbal audio. You can insert tags like [laughs] or [whispers] to add texture to the audio output.
Use cases
1. Accessibility and inclusive design
Text-to-speech technology plays a vital role in making digital spaces accessible. Gemini 3.1 Flash TTS provides highly contextual, clear audio for individuals who rely on screen readers or augmentative and alternative communication devices.
Gaming soundtracks and descriptions
For players navigating game menus, audio descriptions need to be clear, inviting, and easy to follow.
[enthusiasm] You have selected the twilight forest level. [interest] This area features hidden artifacts and new challenges. It includes an expansive map, challenging puzzles, and a specialized survival kit.







