Cloud Monitoring adds long-lookback alert policies for PromQL

When using time offsets, you can only reliably trigger on either drops or spikes, as triggering on both sudden drops and sudden spikes in a single policy may cause your alerts to fire twice.

Think of it this way: If traffic drops steeply today, your alert will trigger immediately. However, exactly 24 hours later, today’s anomalous drop becomes tomorrow’s historical baseline. If your policy triggers on any anomalous difference (higher or lower), the sudden “return to normal” tomorrow will look like a massive spike relative to yesterday’s dip, and you will get a false alert for a phantom anomaly. You can see this in the above chart — the dip in the signal (blue line) reappears as its reciprocal exactly 24 hours later.

To prevent this, you should only track either drops or spikes when monitoring any given metric.

Control runaway costs using dynamic thresholds

Once you can trigger an alert based on deviations from a historical baseline, many interesting use cases open up. For example, you can use dynamic thresholding to prevent overspend for any Google Cloud service that offers a metric that roughly tracks spend.

Say you are concerned about runaway AI token costs. You could do the following:

- Configure a dynamic threshold alert that triggers if the most recent 10 minutes of accumulated input/output token usage is more than 25x the one-week historical average, which should only catch extreme anomalous scenarios (such as leaked API keys) that will definitely result in overspend:

Trigger your alert to fire to a Pub/Sub notification channel that pushes notifications to a Cloud Run function.
That Cloud Run function then runs a workflow that uses the Cloud Quotas API to lower your Token Usage quota to 0, which immediately stops the overspend. Note that legitimate use of tokens will be paused until you can fix the problem… but at least you’ll stop the bleeding.

Sign up to be a design partner

We are working on productizing anomaly detection using dynamic thresholds so they’re easier to write. We’re also working on more complex anomaly detection algorithms in Cloud Monitoring alerting that uses AI models specifically trained on time-series data.

If you’re interested in sharing your thoughts and being an early adopter of what we’re building in this space, sign up to be a preview partner. We’d love to have you!

Cloud Monitoring adds long-lookback alert policies for PromQL

Conversational Analytics in BigQuery now GA

Nano Banana 2 Lite and Gemini Omni Flash available

Nano Banana 2 Lite and Gemini Omni Flash available