Graph-Based ANNS Indexes

Updated 31 July 2025

Graph-based ANNS indexes are scalable data structures that model data as proximity graphs to efficiently solve high-dimensional similarity search problems.
They employ graph-centric navigation, connectivity strategies, and edge pruning to achieve rapid, near-logarithmic search times with greedy or best-first algorithms.
Recent advancements include dynamic and streaming updates, hardware-aware optimizations, and distributed architectures to support billion-scale datasets and evolving workloads.

Graph-based Approximate Nearest Neighbor Search (ANNS) indexes are scalable data structures designed to efficiently solve high-dimensional similarity search problems by modeling the dataset as a proximity graph. Each data point is represented as a vertex, and edges encode local neighbor relationships—enabling rapid traversal via greedy or best-first algorithms. The evolution of these indexes is characterized by advances in their graph construction principles, search-time optimizations, support for dynamic workloads, and adaptation to large-scale, distributed, and heterogeneous hardware environments.

1. Core Design Principles of Graph-Based ANNS Indexes

Graph-based ANNS indexes are built upon several foundational principles that govern both their structure and operational characteristics:

Graph-Centric Navigation: Proximity graphs (e.g., k-NN graphs, Monotonic Search Networks (MSNETs), Hierarchical Navigable Small World graphs (HNSW), Monotonic Relative Neighborhood Graphs (MRNG)) encode local neighborhoods to permit greedy navigation toward nearest neighbors, minimizing the need for exhaustive brute-force comparisons (Fu et al., 2017, Wang et al., 2021).
Connectivity: Ensuring the entire graph is navigable from one or more “entry points” is critical. For instance, the Navigating Spreading-out Graph (NSG) ensures all vertices are reachable from a designated Navigating Node. Connectivity is often enforced by spanning tree procedures and targeted edge additions (Fu et al., 2017).
Sparseness and Edge Selection: Sophisticated pruning strategies (e.g., MRNG’s “lune” region edge selection) keep the average (out-)degree per vertex low, reducing both index memory footprint and candidate expansion costs (Fu et al., 2017, Wang et al., 2021). Algorithms such as NSG, HNSW, and others use variants of relative neighborhood or “diversified neighbor” rules to avoid redundant edges.
Short Search Paths: Path monotonicity (greedy walks that strictly approach the target) is desired to minimize the expected number of hops from entry to result. Theoretical results demonstrate that maintaining such monotonic paths yields near-logarithmic expected search times in high dimensions (Fu et al., 2017).
Scalability: Design choices reflect the need to scale to billion-node graphs, with careful control of index size, memory usage, and efficient support for distributed or external storage (Fu et al., 2017, Ni et al., 2023, Shi et al., 28 Feb 2025).

2. Index Construction Methodologies and Optimizations

Constructing an effective graph-based ANNS index involves a series of algorithmic choices:

Construction Aspect	Example Method/Paper	Key Technical Features
Seed kNN Graph	nn-descent, Faiss	Initialization of candidate edges
Edge Pruning	MRNG, RNG, HNSW	“Lune” rule, heuristics, pruning
Connectivity Correction	DFS/tree spanning, additions	Guarantees reachability
Parallelism & Distribution	SOGAIC (Shi et al., 28 Feb 2025)	Overlap-aware partitioning, distributed merging
Hardware-specific Optimization	GGNN (Groh et al., 2019), Flash (Wang et al., 25 Feb 2025)	GPU-acceleration, SIMD-aware codes

NSG constructs its index by starting from a high-quality approximate kNN graph, extracting candidates for each node using a greedy search from a medoid, and prunes candidates using recursively applied MRNG conditions, followed by connectivity correction (Fu et al., 2017).
Relative NN-Descent combines NN-descent’s neighbor refinement with RNG-style pruning to directly generate sparse graphs, reducing construction time by 2x compared to NSG without degrading search quality (Ono et al., 2023).
Flash (Wang et al., 25 Feb 2025) accelerates indexing by replacing high-dimensional floating-point vectors with compact SIMD-optimized codes, allowing for 10×–22× index-building speedups without recall loss.

3. Search Algorithms and Performance Guarantees

The canonical search over graph-based ANNS indexes proceeds via greedy or best-first expansion from a chosen entry node. The process and its theoretical guarantees are influenced by index properties:

Monotonicity: If the index graph is an MSNET, greedy search with no backtracking is guaranteed to converge to the destination, as each hop approaches the target (Fu et al., 2017). The expected search path length is shown to be

$O\left( \frac{n^{1/d} \log(n^{1/d})}{\triangle r} \right)$

for $n$ points in $d$ -dimensions under mild conditions.

Entry Point Selection: Query-sensitive entry vertex mechanisms, as in DiskANN++ (Ni et al., 2023) or GATE (Ruan et al., 19 Jun 2025), reduce routing distance and required I/O by selecting an initial node close to the query.
Routing Strategies: Beyond simple heuristics, recent work introduces probabilistic routing operators such as PEOs which use multiple random projections and provide formal guarantees on not pruning relevant neighbors with probability $1-\epsilon$ (Lu et al., 2024).
Parameter Adaptation: Automated, query-dependent adjustment of search parameters (candidate pool size, degree, prefetch configuration) can be realized without index rebuilding, as demonstrated in VSAG (Zhong et al., 23 Mar 2025).

4. Dynamic and Streaming Updates

Many real-world scenarios require the vector index to absorb insertions and deletions without full rebuilds:

Online Updates: FreshDiskANN (Singh et al., 2021) and IP-DiskANN (Xu et al., 19 Feb 2025) permit concurrent insert, delete, and search by maintaining patch structures or processing deletions in-place, demonstrating stable recall and high throughput.
Workload-Aware and Query-Adaptive Consolidation: CleANN (Zhang et al., 26 Jul 2025) introduces workload-aware linking (prioritizing frequently queried regions), query-adaptive neighborhood consolidation, and semi-lazy memory cleaning. This results in dynamic index robustness and up to 1,200× throughput advantages compared to methods not tuned for full dynamism under concurrent query and update conditions.
Adaptive Awareness and Query Distribution Alignment: GATE (Ruan et al., 19 Jun 2025) overlays adaptive topology and query-awareness modules using contrastive learning to optimize entry point selection, offering 1.2–2.0× speedups.

5. Hardware-Aware and Distributed Architectures

Modern deployment at scale often requires adapting graph indexes to distributed, out-of-core, or accelerator hardware:

System/Paper	Setting	Key Innovations
GGNN (Groh et al., 2019)	GPU-based	Hierarchical index construction, NN-graph push-pull
AiSAQ (Tatsuno et al., 2024)	DRAM-free, SSD-resident	PQ-compressed graph, aggressive memory optimization
SHINE (Widmoser et al., 23 Jul 2025)	Disaggregated memory	RDMA-aware HNSW, logical cache coordination
SOGAIC (Shi et al., 28 Feb 2025)	Billion-scale clusters	Overlap-aware partitioning, distributed subgraph merging
RoarGraph (Chen et al., 2024)	Cross-modal OOD search	Query-distribution guidance for bipartite graph construction

All-in-storage systems like AiSAQ reduce DRAM usage to ~10 MB even for billion-scale datasets. SHINE distributes global, graph-preserving HNSW graphs across memory nodes and compute nodes using RDMA, leveraging logical index partitioning to combine and balance caches, ensuring accuracy and high throughput.

6. Evaluation Methodologies and Robustness Measures

The assessment of graph-based ANNS indexes relies on:

Search Recall & Latency: Commonly Recall@ $k$ (fraction of true top- $k$ hits) and QPS (queries per second) are core metrics. NSG, DiskANN++, and others report empirical near-logarithmic complexity and high precision at sub-10 ms latency (Fu et al., 2017, Ni et al., 2023).
Scalability: Deployment on 10-billion–scale datasets (SOGAIC, SHINE) and resource constraint adaptation are now standard evaluation axes (Shi et al., 28 Feb 2025, Widmoser et al., 23 Jul 2025).
Query Hardness and Robustness: Distance-based query hardness metrics (e.g., local intrinsic dimensionality) are now complemented (and sometimes supplanted) by graph-native measures such as Steiner-hardness, which better predict the minimum query “effort” by considering graph connectivity and are highly correlated (coefficients up to 0.98) with actual query costs (Wang et al., 2024).

These directions enable new, less biased workload generation for more rigorous stress-testing of index robustness, and inform the design of algorithmic adaptation to “hard” queries.

7. Open Challenges and Future Directions

Despite extensive progress, several challenges remain active topics of research:

Handling Out-of-Distribution (OOD) Queries: New methods (e.g., OOD-DiskANN (Jaiswal et al., 2022), RoarGraph (Chen et al., 2024)) explicitly incorporate small OOD query samples during graph construction to improve search quality when the query and index distributions differ.
Efficient Constrained Search: Most graph-based designs are not optimized for complex filter predicates or multi-attribute queries; partition-based approaches such as CAPS (Gupta et al., 2023) offer practical alternatives.
Hybrid and Learned Indexes: Differentiable, routing-guided quantization (e.g., RPQ (Yue et al., 2023)) integrates graph structure into compression, leading to improved QPS—empirically up to 4.2× over standard PQ when maintaining the same recall.
Online Learning and Adaptivity: Systems such as VSAG (Zhong et al., 23 Mar 2025) and GATE (Ruan et al., 19 Jun 2025) point to a future of indexes that adapt at runtime to hardware profiles, workload drift, and evolving query distributions, potentially in a plug-and-play fashion over existing graph indexes.
Ultra-Scalable Construction: Adaptive overload-aware partitioning and distributed merging (SOGAIC (Shi et al., 28 Feb 2025)), as well as hardware-efficient indexing (Flash (Wang et al., 25 Feb 2025)), are necessary for building indexes for truly massive and dynamic datasets, while new approaches target even faster in-place and fully dynamic graph maintenance (CleANN (Zhang et al., 26 Jul 2025)).

A plausible implication is that future systems will blend on-the-fly adaptive topology, query- and workload-awareness, and hybrid in-memory-plus-SSD graph representations, guided by both structural (e.g., monotonicity, robustness) and data-distributional properties. The persistent challenge remains balancing index construction time, search throughput, memory and storage footprint, and resilience across changing data distributions and system environments.