BatANN: Distributed Disk-Based ANN Search

Updated 11 December 2025

BatANN is a distributed, disk-based ANN search system designed for massive vector datasets using a global proximity graph and an efficient baton-passing protocol.
It partitions the global graph with a neighborhood-aware algorithm to minimize inter-server I/O and maintain logarithmic search efficiency.
Performance evaluations demonstrate low latency (<6ms) and high recall (0.95@10) with near-linear throughput scaling across multiple servers.

BatANN is a distributed, disk-based approximate nearest neighbor (ANN) search system designed to support high-throughput, low-latency vector search over datasets too large to fit in memory or on a single server. By integrating a single global proximity graph, neighborhood-aware partitioning, and a baton-passing protocol that transfers full query state across servers, BatANN ensures logarithmic search efficiency, minimal inter-server I/O, and near-linear throughput scaling as the number of servers increases. It is the first open-source distributed disk-based vector search system to operate over a single global graph (Dang et al., 10 Dec 2025).

1. Problem Setting and Motivation

Modern information-retrieval applications—including retrieval-augmented generation (RAG), large-scale image search, and recommendation—require efficient $k$ -nearest neighbor (k-NN) queries in high-dimensional embedding spaces. A brute-force (exact) search has complexity $O(N)$ or worse due to dimensionality, making it impractical for large $N$ . Graph-based ANN methods, empirically achieving $O(\log N)$ convergence using proximity graphs, are the dominant scalable solution.

As dataset sizes approach billions of points, global index structures exceed DRAM capacity. Disk-based systems (e.g., DiskANN, Starling) store full embeddings and neighbor lists on SSDs, retaining quantized vector summaries (PQ codes) in memory to prune candidate sets. However, a single-server design is ultimately limited by SSD IOPS and bandwidth. Distributing storage and computation across multiple machines can raise query-per-second (QPS) throughput, but naïve sharding ("scatter–gather") strategies lose logarithmic search efficiency and waste I/O. Prior distributed solutions based on global graphs incur high inter-server latency due to round-trip communication per graph hop.

Target system metrics are Recall@k (e.g., 0.95 at $k=10$ ), throughput (QPS) at fixed recall, and end-to-end latency below 6 ms even at very high QPS (Dang et al., 10 Dec 2025).

2. Index Structure and System Architecture

2.1 Global Proximity Graph (Vamana)

BatANN constructs a single global proximity graph $G = (V, E)$ using the Vamana algorithm. Each node in $V$ (representing a datapoint) maintains up to $R$ out-edges ( $R=64$ in practice), selected by a diversifying heuristic to improve graph navigability. Formally, each node $i$ retains a neighbor set $N(i)$ , $|N(i)| \leq R$ , designed so that greedy traversal from any starting point converges to the true nearest neighbors in $O(\log N)$ steps.

Indexing details:

On-disk: Each vector's full (float) embedding and its neighbor ID list are stored, typically compressed to fit within a 4 KB SSD sector.
In-memory: Product-quantized (PQ) codes, consuming 32 bytes per 128-dimensional vector, guide the search. Overall DRAM usage is $\approx 32 \cdot N$ bytes.

2.2 Distributed Graph Partitioning

The global graph is partitioned into $P$ server shards using a neighborhood-aware graph partitioner [arXiv (Gottesbüren et al., 4 Mar 2024)], minimizing off-server beam-search hops. Empirical results indicate that, with 10 servers, only 20–25% of graph hops cross server boundaries.

Each server is provisioned as follows:

Replicated PQ codes for all $N$ vectors.
Local SSD storage for embeddings and neighbor lists of its partition.
An in-memory “head” graph for rapid starting-point selection, consisting of a 1% random sample of $G$ .

3. The Baton-Passing Protocol

3.1 Motivation

Previous distributed global-graph approaches incur at least two network round-trips per off-server neighbor access (request + response). BatANN’s key innovation is to transfer ("pass") the entire query’s state to the owning server, which then continues the beam search locally. This asynchronous baton-passing approach reduces per-hop network overhead by nearly half and aligns with modern high-throughput networking and SSD architectures.

3.2 Query State and Protocol

A query’s “baton” (state $S$ ) includes:

Beam $P$ (size $L$ , each entry $\langle$ node, approx-dist $\rangle$ ),
Explored set $E \subseteq P$ ,
Full-precision result list for reranking,
Parameters $(L, k, W)$ and the query embedding $q$ .

For typical $L \in [200, 400]$ , message size is $4$–$8$ KB.

The distributed beam-search protocol operates as follows:

The client hashes $q$ to a designated server $s_0$ , which performs an in-memory head-index search and initializes $S$ .
While unexplored beam nodes remain:
- Select up to $W=8$ unexplored nodes.
- If any are local: issue concurrent IO_uring SSD reads and update beam/explored set.
- If all are remote: select the node $v^*$ with minimum distance, serialize $S$ , and transfer the baton to the server owning $v^*$ . That server resumes execution.
Once all beam nodes are explored, the holding server returns top- $k$ results to the client.

Each baton-passing step incurs latency $t_{\text{ser}}$ (serialization, $\sim 10\,\mu$ s) plus $t_{\text{net}}$ (TCP, $\sim 50\,\mu$ s on 25 GbE) per inter-server hop.

4. Complexity, Efficiency, and Optimizations

4.1 Computational Cost

On a single server holding $N/P$ points, greedy beam search converges in $O(\log(N/P))$ steps. Retaining a single global graph ensures total hops remain $O(\log N)$ regardless of $P$ . Each step issues up to $W$ concurrent SSD reads, and with high-random-IOPS NVMe SSDs ( $\geq$ 300K), increasing $W$ has negligible impact on per-step read cost.

4.2 Throughput Scaling

Because per-query disk I/O and distance computation are nearly constant irrespective of $P$ , distributing the same workload enables near-linear scalability. Throughput on $P$ servers can be modeled as $QPS_P \approx P \cdot QPS_1 \cdot \eta$ , where $\eta \in [0.9, 1.0]$ accounts for inter-server hop overhead.

4.3 Locality and Algorithmic Heuristics

Graph partitioning (arXiv (Gottesbüren et al., 4 Mar 2024)) yields 75–88% of hops that remain local, limiting baton passes. When $W > 1$ , a heuristic additionally processes all local candidates before considering baton passing, further reducing inter-server communication.

4.4 Batching and Caching

Each thread processes a fixed set of 8 queries in a pipelined, interleaved manner to overlap SSD I/O and computation. This yields 20–30% higher single-server throughput. Query embedding $q$ is cached after the initial transmission and is reused until final result acknowledgment, eliminating duplication.

5. Experimental Evaluation

5.1 Datasets and Experimental Setup

The system is evaluated on 10-node CloudLab c6620 clusters (28-core Intel Xeon Gold, 128 GB DRAM, 25 GbE) using standard Big-ANN benchmarks:

BIGANN: 100M and 1B SIFT 128-dim (uint8), L2 distance,
MSSPACEV: 100M and 1B SpaceV 100-dim (int8),
DEEP: 100M 96-dim float conv-net features.

5.2 Baselines

ScatterGather: Identical graph and partitioning, but each subgraph is queried independently then merged (no global graph).
Single-server baselines: DiskANN, PipeANN, CoroSearch (for microbenchmarking).

Index built using Vamana with $R=64$ , $L=128$ , $\alpha=1.2$ (PipeANN for 100M; ParlayANN for 1B).

5.3 Performance Highlights

At Recall@10 = 0.95:

For 100M points, 10 servers, BatANN achieves 6.21–6.49 $\times$ QPS improvement vs. ScatterGather (e.g., 6.49 $\times$ on DEEP).
For 1B points, 10 servers, BatANN achieves 5.10 $\times$ (BIGANN), 2.5 $\times$ (MSSPACEV).
BatANN’s total disk I/O and distance calculations remain within 1% of single-server levels; in contrast, ScatterGather’s I/O and compute scale proportionally with $P$ .
QPS scales as $0.9 \cdot P$ at fixed recall with 10 servers; ScatterGather saturates after a few shards.

Dataset	Size	Servers	QPS Speedup (BatANN vs ScatterGather)	Mean Latency (ms)
BIGANN	100M, 1B	10	6.21–6.49× (100M), 5.10× (1B)	$\leq$ 6
MSSPACEV	100M, 1B	10	6.21–6.49× (100M), 2.5× (1B)	$\leq$ 6
DEEP	100M	10	6.49×	$\leq$ 6

At saturating QPS rates, BatANN maintains mean latency $\leq 6$ ms (rising only 13% from 5 to 10 servers), whereas ScatterGather’s tail latency degrades rapidly above 6K QPS.

5.4 Beam Width Ablation

Ablating beam width from $W=1$ to $W=8$ reduces total and inter-server hops by approximately 4 $\times$ , yielding over 2 $\times$ higher QPS and about half the latency at 0.95 recall@10, with no extra CPU or I/O cost.

6. Limitations and Future Directions

Message size: Each baton currently sends the entire result list ( $L$ nodes); streaming only the beam state and piggybacking partial results is a proposed future optimization.
Disk layout: Incorporating locality-aware disk placement (as in Starling’s 4 KB-aligned strategy) and dynamic pipeline width (as in PipeANN) could further minimize I/O and latency.
Dynamic updates: Efficient distributed point insert/delete without full repartitioning remains unresolved; multi-node in-place update mechanisms have yet to be generalized.
Fault tolerance: There is no current replication or fail-over; integration with state machine replication systems such as Derecho is a suggested extension.
Network optimizations: Although commodity TCP suffices for multi-KB baton messages, the architecture would readily enable RDMA or CXL deployment, whose benefits are yet to be quantified.
Multitenancy: Resource sharing and scheduling for multiple distinct vector search databases per cluster remain open research issues.

A plausible implication is that BatANN’s methodology may serve as a blueprint for distributed sublinear-time search over massive graph-structured data, given continued progress on distributed dynamic indexing and system-level robustness (Dang et al., 10 Dec 2025).

PDF Markdown Chat (Pro)

References (2)

Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN (2025)

Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to BatANN.