The Gemini embedding model utilizes a new quota control method known as Tokens Per Minute (TPM). For projects with a high reputation score, the default TPM value is 5 million. It’s important to note that the maximum TPM a customer can set without manual approval is 20 million.
The total number of rows a job can process depends on the number of tokens per row. For example, a job processing a table with 300 tokens per row can handle up to 12 million rows with a single job.
OSS embedding models in BigQuery ML
The OSS community is rapidly evolving the text-embedding model landscape, offering a wide spectrum of choices to fit any need. Offerings range from top-ranking models like the recent Qwen3-Embedding & EmbeddingGemma to small, efficient, and cost-effective small models such as multilingual-e5-small.
You can now use any Hugging Face text embedding models (13K+ options) deployed to Vertex AI Model Garden in BigQuery ML. To showcase this capability, in our example we will use multilingual-e5-small, which delivers respectable performance while being massively scalable and cost-effective, making it a strong fit for large-scale analytical tasks.
To use an open-source text-embedding model, follow these steps.
1. Host the model on a Vertex endpoint
First, choose a text-embedding model from Hugging Face, in this case, the above-mentioned multilingual-e5-small. Then, navigate to Vertex AI Model Garden > Deploy from Hugging Face. Enter the model URL, and set the Endpoint access to “Public (Shared endpoint)”. You can also customize the endpoint name, region, and machine specs to fit your requirements. The default settings provision a single replica of the specified machine type below: