Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bang for the Buck: Vector Search on Cloud CPUs (2505.07621v1)

Published 12 May 2025 in cs.DB and cs.AI

Abstract: Vector databases have emerged as a new type of systems that support efficient querying of high-dimensional vectors. Many of these offer their database as a service in the cloud. However, the variety of available CPUs and the lack of vector search benchmarks across CPUs make it difficult for users to choose one. In this study, we show that CPU microarchitectures available in the cloud perform significantly differently across vector search scenarios. For instance, in an IVF index on float32 vectors, AMD's Zen4 gives almost 3x more queries per second (QPS) compared to Intel's Sapphire Rapids, but for HNSW indexes, the tables turn. However, when looking at the number of queries per dollar (QP$), Graviton3 is the best option for most indexes and quantization settings, even over Graviton4 (Table 1). With this work, we hope to guide users in getting the best "bang for the buck" when deploying vector search systems.

Summary

Evaluating Vector Search Efficiency on Cloud CPUs

In today's research-intensive paradigm, the development and deployment of vector databases optimized for high-dimensional vector querying have gained substantial momentum. This paper critically examines the performance variability across different CPU microarchitectures in facilitating vector similarity search (VSS), notably in cloud environments.

Core Analysis

The paper provides a detailed benchmarking paper, utilizing a range of microarchitectures available on the AWS cloud, namely AMD's Zen3 and Zen4, Intel's Sapphire Rapids and its Z variant, and AWS's Graviton3 and Graviton4. The evaluation focuses on vector search scenarios using two dominant index structures—IVF and HNSW—alongside full scan operations.

Vector databases, such as FAISS and USearch, are instrumental components of this paper, where indices are quantized at varying levels from full 32-bit floats to binary quantization. Notably, Graviton3 emerges prominently as the most cost-efficient option, with Graviton3 consistently providing superior queries-per-dollar (QP\$/QPS) across diverse search scenarios, followed closely by Zen4 which shows strong performance in IVF queries. Despite being the successor, Graviton4 does not consistently outperform Graviton3, largely due to architectural differences leading to latency and throughput variations.

Technical Insights

  1. Data-Access Patterns: The paper emphasizes the pivotal role of data-access patterns inherent to the chosen indexing approach. IVF benefits from sequential data access, aligned with high-bandwidth and low-latency memory architectures such as those in Graviton processors, contributing to its favorable performance metrics. In contrast, HNSW's non-sequential, cache-friendly access patterns are capitalized on by the Intel architectures.
  2. SIMD Capabilities: The paper underscores the importance of single instruction multiple data (SIMD) implementations in enhancing the efficiency of distance calculations. It highlights disparities in SIMD usage across different CPUs, with Graviton's NEON/SVE adaptations and Zen4's AVX-512 showcasing differential efficacy across data types.
  3. Quantization Implications: The outcomes suggest that quantization level significantly impacts performance, where Graviton CPUs exhibit diminished performance under scalar implementation constraints in FAISS compared to their symmetric kernels optimized scenarios in USearch.

Implications for Cloud Deployment

The findings of this paper have profound implications for businesses and researchers leveraging cloud-based vector databases. By selecting CPU instances tailored to precise indexing needs and data characteristics, significant gains in query efficiency and cost-effectiveness can be achieved. The paper guides practitioners in optimizing resource allocation for different workloads, affirming that Graviton3 and Zen4 provide cost-effective alternatives especially when dealing with high-dimensional data.

Future Directions

Future iterations of this research could address emerging vector-centric workloads, including multi-threaded or batch-query models, and draw comparisons across novel GPU architectures for further insights. Additionally, examining the interplay between vector database systems and these hardware configurations can uncover optimizations that bridge software capabilities with inherent hardware strengths, potentially setting benchmarks for future developments in VSS technologies.

In summation, this paper provides a granular understanding of the interplay between CPU architecture and vector database performance, offering valuable insights for optimizing vector search operations across cloud platforms. This comprehensive analysis advances the field's understanding of hardware-dependent efficiencies, specifying architectural preferences that resonate with specific vector processing paradigms.

HackerNews