Papers
Topics
Authors
Recent
2000 character limit reached

BatANN: Distributed Disk-Based ANN Search

Updated 11 December 2025
  • BatANN is a distributed, disk-based ANN search system designed for massive vector datasets using a global proximity graph and an efficient baton-passing protocol.
  • It partitions the global graph with a neighborhood-aware algorithm to minimize inter-server I/O and maintain logarithmic search efficiency.
  • Performance evaluations demonstrate low latency (<6ms) and high recall (0.95@10) with near-linear throughput scaling across multiple servers.

BatANN is a distributed, disk-based approximate nearest neighbor (ANN) search system designed to support high-throughput, low-latency vector search over datasets too large to fit in memory or on a single server. By integrating a single global proximity graph, neighborhood-aware partitioning, and a baton-passing protocol that transfers full query state across servers, BatANN ensures logarithmic search efficiency, minimal inter-server I/O, and near-linear throughput scaling as the number of servers increases. It is the first open-source distributed disk-based vector search system to operate over a single global graph (Dang et al., 10 Dec 2025).

1. Problem Setting and Motivation

Modern information-retrieval applications—including retrieval-augmented generation (RAG), large-scale image search, and recommendation—require efficient kk-nearest neighbor (k-NN) queries in high-dimensional embedding spaces. A brute-force (exact) search has complexity O(N)O(N) or worse due to dimensionality, making it impractical for large NN. Graph-based ANN methods, empirically achieving O(logN)O(\log N) convergence using proximity graphs, are the dominant scalable solution.

As dataset sizes approach billions of points, global index structures exceed DRAM capacity. Disk-based systems (e.g., DiskANN, Starling) store full embeddings and neighbor lists on SSDs, retaining quantized vector summaries (PQ codes) in memory to prune candidate sets. However, a single-server design is ultimately limited by SSD IOPS and bandwidth. Distributing storage and computation across multiple machines can raise query-per-second (QPS) throughput, but naïve sharding ("scatter–gather") strategies lose logarithmic search efficiency and waste I/O. Prior distributed solutions based on global graphs incur high inter-server latency due to round-trip communication per graph hop.

Target system metrics are Recall@k (e.g., 0.95 at k=10k=10), throughput (QPS) at fixed recall, and end-to-end latency below 6 ms even at very high QPS (Dang et al., 10 Dec 2025).

2. Index Structure and System Architecture

2.1 Global Proximity Graph (Vamana)

BatANN constructs a single global proximity graph G=(V,E)G = (V, E) using the Vamana algorithm. Each node in VV (representing a datapoint) maintains up to RR out-edges (R=64R=64 in practice), selected by a diversifying heuristic to improve graph navigability. Formally, each node ii retains a neighbor set N(i)N(i), N(i)R|N(i)| \leq R, designed so that greedy traversal from any starting point converges to the true nearest neighbors in O(logN)O(\log N) steps.

Indexing details:

  • On-disk: Each vector's full (float) embedding and its neighbor ID list are stored, typically compressed to fit within a 4 KB SSD sector.
  • In-memory: Product-quantized (PQ) codes, consuming 32 bytes per 128-dimensional vector, guide the search. Overall DRAM usage is 32N\approx 32 \cdot N bytes.

2.2 Distributed Graph Partitioning

The global graph is partitioned into PP server shards using a neighborhood-aware graph partitioner [arXiv (Gottesbüren et al., 4 Mar 2024)], minimizing off-server beam-search hops. Empirical results indicate that, with 10 servers, only 20–25% of graph hops cross server boundaries.

Each server is provisioned as follows:

  • Replicated PQ codes for all NN vectors.
  • Local SSD storage for embeddings and neighbor lists of its partition.
  • An in-memory “head” graph for rapid starting-point selection, consisting of a 1% random sample of GG.

3. The Baton-Passing Protocol

3.1 Motivation

Previous distributed global-graph approaches incur at least two network round-trips per off-server neighbor access (request + response). BatANN’s key innovation is to transfer ("pass") the entire query’s state to the owning server, which then continues the beam search locally. This asynchronous baton-passing approach reduces per-hop network overhead by nearly half and aligns with modern high-throughput networking and SSD architectures.

3.2 Query State and Protocol

A query’s “baton” (state SS) includes:

  • Beam PP (size LL, each entry \langlenode, approx-dist\rangle),
  • Explored set EPE \subseteq P,
  • Full-precision result list for reranking,
  • Parameters (L,k,W)(L, k, W) and the query embedding qq.

For typical L[200,400]L \in [200, 400], message size is $4$–$8$ KB.

The distributed beam-search protocol operates as follows:

  1. The client hashes qq to a designated server s0s_0, which performs an in-memory head-index search and initializes SS.
  2. While unexplored beam nodes remain:
    • Select up to W=8W=8 unexplored nodes.
    • If any are local: issue concurrent IO_uring SSD reads and update beam/explored set.
    • If all are remote: select the node vv^* with minimum distance, serialize SS, and transfer the baton to the server owning vv^*. That server resumes execution.
  3. Once all beam nodes are explored, the holding server returns top-kk results to the client.

Each baton-passing step incurs latency tsert_{\text{ser}} (serialization, 10μ\sim 10\,\mus) plus tnett_{\text{net}} (TCP, 50μ\sim 50\,\mus on 25 GbE) per inter-server hop.

4. Complexity, Efficiency, and Optimizations

4.1 Computational Cost

On a single server holding N/PN/P points, greedy beam search converges in O(log(N/P))O(\log(N/P)) steps. Retaining a single global graph ensures total hops remain O(logN)O(\log N) regardless of PP. Each step issues up to WW concurrent SSD reads, and with high-random-IOPS NVMe SSDs (\geq300K), increasing WW has negligible impact on per-step read cost.

4.2 Throughput Scaling

Because per-query disk I/O and distance computation are nearly constant irrespective of PP, distributing the same workload enables near-linear scalability. Throughput on PP servers can be modeled as QPSPPQPS1ηQPS_P \approx P \cdot QPS_1 \cdot \eta, where η[0.9,1.0]\eta \in [0.9, 1.0] accounts for inter-server hop overhead.

4.3 Locality and Algorithmic Heuristics

Graph partitioning (arXiv (Gottesbüren et al., 4 Mar 2024)) yields 75–88% of hops that remain local, limiting baton passes. When W>1W > 1, a heuristic additionally processes all local candidates before considering baton passing, further reducing inter-server communication.

4.4 Batching and Caching

Each thread processes a fixed set of 8 queries in a pipelined, interleaved manner to overlap SSD I/O and computation. This yields 20–30% higher single-server throughput. Query embedding qq is cached after the initial transmission and is reused until final result acknowledgment, eliminating duplication.

5. Experimental Evaluation

5.1 Datasets and Experimental Setup

The system is evaluated on 10-node CloudLab c6620 clusters (28-core Intel Xeon Gold, 128 GB DRAM, 25 GbE) using standard Big-ANN benchmarks:

  • BIGANN: 100M and 1B SIFT 128-dim (uint8), L2 distance,
  • MSSPACEV: 100M and 1B SpaceV 100-dim (int8),
  • DEEP: 100M 96-dim float conv-net features.

5.2 Baselines

  • ScatterGather: Identical graph and partitioning, but each subgraph is queried independently then merged (no global graph).
  • Single-server baselines: DiskANN, PipeANN, CoroSearch (for microbenchmarking).

Index built using Vamana with R=64R=64, L=128L=128, α=1.2\alpha=1.2 (PipeANN for 100M; ParlayANN for 1B).

5.3 Performance Highlights

At Recall@10 = 0.95:

  • For 100M points, 10 servers, BatANN achieves 6.21–6.49×\times QPS improvement vs. ScatterGather (e.g., 6.49×\times on DEEP).
  • For 1B points, 10 servers, BatANN achieves 5.10×\times (BIGANN), 2.5×\times (MSSPACEV).
  • BatANN’s total disk I/O and distance calculations remain within 1% of single-server levels; in contrast, ScatterGather’s I/O and compute scale proportionally with PP.
  • QPS scales as 0.9P0.9 \cdot P at fixed recall with 10 servers; ScatterGather saturates after a few shards.
Dataset Size Servers QPS Speedup (BatANN vs ScatterGather) Mean Latency (ms)
BIGANN 100M, 1B 10 6.21–6.49× (100M), 5.10× (1B) \leq6
MSSPACEV 100M, 1B 10 6.21–6.49× (100M), 2.5× (1B) \leq6
DEEP 100M 10 6.49× \leq6

At saturating QPS rates, BatANN maintains mean latency 6\leq 6 ms (rising only 13% from 5 to 10 servers), whereas ScatterGather’s tail latency degrades rapidly above 6K QPS.

5.4 Beam Width Ablation

Ablating beam width from W=1W=1 to W=8W=8 reduces total and inter-server hops by approximately 4×\times, yielding over 2×\times higher QPS and about half the latency at 0.95 recall@10, with no extra CPU or I/O cost.

6. Limitations and Future Directions

  • Message size: Each baton currently sends the entire result list (LL nodes); streaming only the beam state and piggybacking partial results is a proposed future optimization.
  • Disk layout: Incorporating locality-aware disk placement (as in Starling’s 4 KB-aligned strategy) and dynamic pipeline width (as in PipeANN) could further minimize I/O and latency.
  • Dynamic updates: Efficient distributed point insert/delete without full repartitioning remains unresolved; multi-node in-place update mechanisms have yet to be generalized.
  • Fault tolerance: There is no current replication or fail-over; integration with state machine replication systems such as Derecho is a suggested extension.
  • Network optimizations: Although commodity TCP suffices for multi-KB baton messages, the architecture would readily enable RDMA or CXL deployment, whose benefits are yet to be quantified.
  • Multitenancy: Resource sharing and scheduling for multiple distinct vector search databases per cluster remain open research issues.

A plausible implication is that BatANN’s methodology may serve as a blueprint for distributed sublinear-time search over massive graph-structured data, given continued progress on distributed dynamic indexing and system-level robustness (Dang et al., 10 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to BatANN.