Backend developers and architects building high-throughput, low-latency applications increasingly rely on Valkey, an open-source, high-performance key-value datastore that supports a variety of workloads such as caching and message queues. At Google Cloud, we offer a fully managed version as part of Google Cloud Memorystore, and today, we’re excited to announce the general availability (GA) of Valkey 9.0, delivering both massive performance gains and powerful new developer capabilities.
During the preview, we saw remarkable uptake and excitement from customers who require the highest levels of performance. Organizations pushing the boundaries of scale and latency are putting Valkey 9.0 to the test for their most demanding workloads:
“A high-performance caching layer is critical to our infrastructure at Snap. We are excited to see the General Availability of Valkey 9.0 on Google Cloud Memorystore. The new architectural enhancements, including SIMD optimizations, offer strong performance benefits for throughput and latency. Having access to a managed service backed by an open standard gives us valuable flexibility in how we deploy and manage our caching workloads.” – Ovais Khan, Principal Software Engineer, Snap
This need for uncompromising speed and flexibility extends beyond social networking infrastructure into the financial sector, where real-time transaction processing and reliability are non-negotiable.
“In the financial services sector, milliseconds matter and data reliability is paramount. By utilizing Memorystore for Valkey on Google Cloud in the critical GPay stack powering the UPI Acquirer Switch for Top Indian Banks, Juspay is proud to leverage Memorystore to handle high-throughput transactional data with exceptionally low latency.
We are excited about the GA release of Valkey 9.0. The performance gains from features like pipeline memory prefetching, combined with the assurance of a fully managed, truly open-source solution, provide us with the scale and reliability necessary to securely serve all our customers.” – Arun Ramprasadh, Head of UPI, Juspay
Similarly, in the media and entertainment space, delivering uninterrupted experiences to massive audiences requires a caching layer capable of instantly absorbing traffic spikes.
“During the preview of Valkey 9.0 on Google Cloud Memorystore, we’ve experienced amazing performance and stability. Live streaming demands absolutely minimal latency and maximum throughput. The architectural enhancements in Valkey 9.0 allow us to scale our caching layer more efficiently to handle traffic spikes during major events. Relying on a fully managed, open-source solution ensures Fubo can deliver a seamless viewing experience to our audience.” – Kevin Anthony, Platform Engineering Manager, Fubo
Performance at scale: Speed without compromise
Valkey 9.0 is engineered for raw speed. By building upon the enhanced IO threading architecture introduced in Valkey 8.0, Valkey can handle significantly higher throughput and reduced latency on multi-core VMs. The performance gains are driven by several architectural enhancements as highlighted in the official Valkey 9.0 release announcement on the Valkey blog:
- Pipeline memory prefetching: This optimization increases throughput by up to 40% by improving memory access efficiency during pipelining.
- Zero copy responses: For large requests, this feature avoids internal memory copying, yielding up to 20% higher throughput.
- SIMD optimizations: By utilizing SIMD for BITCOUNT and HyperLogLog operations, Valkey 9.0 delivers up to 200% higher throughput for these common tasks.(Note: The figures above are based on open-source benchmarks; actual performance improvements will vary depending on your specific workloads.)
The mechanics of throughput: Pipelining
In Valkey, latency is largely constrained by network round-trip-time (RTT). If an application waits for each response before initiating the next request, total throughput remains bound by this round-trip latency. Pipelining addresses this by disaggregating latency and throughput, allowing multiple requests to be sent over a single connection without awaiting immediate responses.






