Improved interoperability for your Apache Iceberg lakehouse

Today, at the Apache Iceberg Summit in San Francisco, we are announcing the preview of read and write interoperability between BigQuery and Iceberg-compatible engines, including Trino, Spark, and others in Apache Iceberg tables in Google-managed Iceberg REST Catalog. With this new capability, you get the benefits of enterprise-grade native storage for your lakehouse without sacrificing Iceberg’s openness and flexibility.

Why it matters: If you’re building a lakehouse today, you’re probably using Apache Iceberg, which has gained massive popularity among data platform teams that need to support multiple compute engines (like Spark and BigQuery) that access the same data for different workloads. However, we consistently hear from customers that achieving openness often requires compromises.

Compared to using enterprise storage, there’s often price-performance overhead on using Iceberg, wiping out the cost benefits of a single-copy architecture. In order to make Iceberg work for all production use cases, data teams have to invest in custom infrastructure to handle real-time streaming, build complex pipelines to replicate operational data, and navigate fragmented governance across different compute engines. Ultimately, these limitations become bottlenecks to innovation.

Over the years, Google has purpose-built storage infrastructure to solve these exact challenges at scale, powered by highly scalable, real-time metadata, unified governance, and deep vertical integration across Cloud Storage, metadata, and various query engines. We are making this infrastructure available directly in Iceberg.

This enables access to BigQuery’s advanced runtime, automatic table management, partitioning, multi-statement transactions, and change data replication for Google-managed Iceberg REST catalog tables. These features will be available in preview for Google-managed Iceberg REST catalog tables and will be generally available (GA) for BigQuery-managed Iceberg tables, coming next month.

Write and read interoperability across engines

Previously, customers building lakehouses chose between Iceberg tables in the Google-managed Iceberg REST catalog or tables managed by BigQuery based on their primary ETL engine. That means that customers relying on Apache Spark for ETL to Iceberg REST Catalog tables couldn’t write through BigQuery or use its storage management features.

With this preview, you can create, update, and query Iceberg tables in the Google serverless Iceberg REST catalog with BigQuery or other Iceberg-compatible engines such as Spark, Flink, Trino and others. This two-way read and write interoperability enables data teams to implement multi-engine use cases on a single table type in a fully open manner, using native Iceberg libraries.

Additionally, Iceberg REST Catalog offers table-level access controls using credential vending for uniform governance across BigQuery, Spark and other compute engines that query or modify your Iceberg tables.

Google Cloud also supports a robust ecosystem of partners integrated with the Iceberg REST Catalog across data platforms and engines, transformation and ingestion services, and governance platforms. We work closely with the Iceberg ecosystem to strengthen these partnerships with many more to come.

Improved interoperability for your Apache Iceberg lakehouse

Optimize AI/ML workloads with GKE Cloud Storage FUSE Profiles