Deep dive into BigQuery AI.AGG function

While BigQuery already offers powerful AI functions that help you analyze individual rows of data, analyzing unstructured data at scale requires a different approach. AI.AGG() lets you ask questions from unstructured data such as logs and documents, for example:

What are the top three feature requests among the negative product reviews?
What kind of errors are users seeing most frequently, and how should I start investigating them?
In which specific scenarios is our automated agent consistently failing to resolve customer issues?

In this post, we’ll dive deeper into the AI.AGG() function and look at a few of the use cases that it unlocks, including how it can be used in combination with BigQuery’s other managed AI functions for complex, intelligent data analysis.

Analyzing system logs with AI.AGG()

A great example of the power of AI.AGG() is analyzing system logging. Log messages, warnings, errors, and stack traces can contain extremely useful information for improving your service, but it can be time- and labor-intensive to investigate them manually — especially if you operate at scale and have thousands of them to review.

With AI.AGG(), you can easily analyze many logs at once, grouping and prioritizing them to decide which ones to dig deeper into first. In fact, our BigQuery engineering team used this exact approach while developing AI.AGG() — using the function to help identify edge cases related to input handling for the feature itself!

To demonstrate this, let’s analyze a public dataset of Apache Spark standard INFO logs available from Loghub. Often, clusters can run into issues like memory thrashing, clock drift, or broadcast bottlenecks without ever throwing a FATAL error. You can use AI.AGG() to analyze these seemingly normal logs for hidden inefficiencies. You can load the sample data file into BigQuery using any of the supported methods, such as the UI, CLI, or client libraries. The following example assumes you’ve loaded the log file into a dataset called bq_logs_demo and table named spark_logs_unstructured.

Notice how we construct the prompt here. We explicitly give the model permission to say “everything is fine,” which prevents it from hallucinating errors, while instructing it to hunt for specific anomalies:

Deep dive into BigQuery AI.AGG function

Cloud CISO Perspectives: How Google Cloud Security uses AI internally