HNSW: Efficient Graph-Based ANN Search
- HNSW is a graph-based algorithm that organizes data into hierarchical, navigable small-world layers for efficient approximate nearest neighbor search.
- It constructs multiple layers using a randomized, greedy insertion process with beam search at the base level, achieving near-logarithmic search complexity.
- Recent variants enhance performance with skip bridges, dual-branch structures, and RDMA adaptations for faster updates and lower memory usage.
The Hierarchical Navigable Small World (HNSW) algorithm is a graph-based approach for approximate nearest neighbor (ANN) search that exploits hierarchical proximity graphs to achieve efficient, scalable retrieval in general metric spaces. HNSW is characterized by a multi-layer structure in which each layer is an independently navigable small-world graph over a subset of the data points. Search complexity is near-logarithmic with high empirical recall, making HNSW the prevailing paradigm in dense vector and AI retrieval workloads.
1. Core Structure and Construction of HNSW
HNSW organizes data points into a hierarchy of proximity graphs—layers indexed by , with layer 0 covering all points and higher layers containing sparser samples. Each point is assigned a random maximum layer geometric distribution, typically , where controls the decay (Malkov et al., 2016).
Insertion proceeds as follows:
- For each new point with assigned top layer :
- Greedy descent from the current global entry point at the highest layer down to layer , moving at each step to the nearest neighbor in the present layer.
- In each layer from down to 0, perform a local search (beam width ) to identify candidate neighbors.
- Out of these candidates, select up to neighbors using the "heuristic pruning" procedure (relative neighborhood criterion): retain candidates only if, for all chosen , . Edges are bidirectional.
- Update the global entry point if the new point's level is highest.
The randomized level assignment induces a skip-list-like hierarchy. Upper layers facilitate global traversal via long-range links, while layer 0 is densely connected for local refinement. The average number of graph layers per data point is , and the total indexing cost is in practice (Malkov et al., 2016, Ma et al., 2023).
2. Search Procedure and Complexity
HNSW answers a -NN query through two stages:
- Hierarchical Greedy Descent: Starting at the global entry point in the top layer, repeatedly move to the neighbor closest to the query until no closer neighbor is found; descend to the next layer and repeat.
- Layer-0 Best-First Search: At the base layer, maintain a candidate max-heap of size ; iteratively expand the best candidate, scan its neighbors, and insert new candidates into the heap until convergence.
The search complexity is empirically for random or embedding-based workloads, saturating to sublinear scaling even for millions of points. The memory footprint is , though most edges reside in the base layer (Malkov et al., 2016, Ma et al., 2023).
3. Neighbor Selection, Connectivity, and Pruning
Neighbor selection critically uses the diverse-neighbors rule (relative neighborhood heuristic)—no two chosen neighbors "block" each other, ensuring the graph remains well-connected and avoids local cluster traps (Malkov et al., 2016). For construction, increasing or yields denser graphs and better recall but raises memory and index time linearly; the heuristic is essential for good performance on highly clustered and high-dimensional data (Malkov et al., 2016, Elliott et al., 28 May 2024).
The standard HNSW (pruning parameter in RobustPrune) lacks theoretical shortcut guarantees: pathological clustering or adversarial insertion order can trap greedy search in suboptimal regions, causing empirical worst-case query time on low-dimensional hard instances (Indyk et al., 2023). Some extensions (DiskANN) employ pruning to establish theoretical shortcut properties and guarantee polylogarithmic search (Indyk et al., 2023).
4. Hierarchical Structure, the Hub-Highway Hypothesis, and Alternatives
The effectiveness of HNSW's hierarchy in modern high-dimensional ANN workloads is debated. Recent studies show that for large-scale, high-dimensional data (), a flat navigable small world graph (FlatNav) matches HNSW's latency and recall while reducing memory usage by 38% (Munyampirwa et al., 2 Dec 2024). The underlying mechanism is the "Hub Highway Hypothesis": high-dimensional embedding spaces naturally yield a set of "hub" nodes that are k-nearest-neighbors for many points and form a densely interconnected "highway." Greedy search rapidly enters this subgraph, obviating the need for explicit hierarchy—multi-layer skipping does not improve traversal.
Empirical measurements demonstrate:
- No latency difference () between HNSW and FlatNav on all major benchmarks (Munyampirwa et al., 2 Dec 2024).
- The majority of beam-search visits occur on "hub highway" nodes (50–70% in early expansion).
This suggests that explicit hierarchy is dispensable for high- ANN—explicitly constructing or favoring hubs could provide equivalent performance at lower cost (Munyampirwa et al., 2 Dec 2024).
5. Parameter Sensitivity and Impact of Data Characteristics
HNSW performance depends on:
- Intrinsic dimensionality: Data with high intrinsic or local intrinsic dimensionality (LID) requires larger ; recall degrades linearly as LID increases (Elliott et al., 28 May 2024). LID is computed as
- Insertion order: Inserting high-LID or category-diverse points early ("annealing") can boost recall by up to 12 percentage points. Descending-LID order provides +2.6pp (HNSWLib) and +6.2pp (FAISS) average recall@10 over random, with swings up to 7.7pp in real-world image retrieval (Elliott et al., 28 May 2024).
- Default parameters: , , are typical in deployed systems (Elliott et al., 28 May 2024). Larger values improve recall but incur trade-offs in speed and memory.
6. Recent Variants and Practical Extensions
Dual-Branch HNSW++ and LID-Driven Optimization
HNSW++ partitions the dataset into two branches, each with its own hierarchy, merging at layer 0. Layer assignments preferentially select high-LID points for upper layers. The bridge-building technique allows direct traversal from high layers to layer 0 ("skip bridges") when searches reach sparse regions (Nguyen et al., 23 Jan 2025). Key findings:
- Recall enhancement: +18% (NLP), +30% (CV datasets).
- Construction time reduction: 16–20% vs vanilla HNSW.
- Dual-branch and LID insertion mitigate cluster disconnections and local minima.
Disaggregated Memory and d-HNSW
d-HNSW adapts HNSW for RDMA disaggregated architectures. Optimal performance is achieved via meta-HNSW representative caching, RDMA-friendly contiguous data layout, and batch/deduplicated loading (Liu et al., 17 May 2025). Benchmarks on SIFT1M at recall 0.87 demonstrate up to 117× latency reduction over naïve memory layouts.
Predicate-Agnostic DBMS Filtering
NaviX extends HNSW in graph DBMSs to robustly support filtered kNN queries. It implements adaptive, locally selective candidate expansion to maintain search efficiency and recall under varying filter selectivities and correlation structures (Sehgal et al., 29 Jun 2025).
Real-Time Updates, Deletions, and Unreachable Points
MN-RU (Mutual-Neighbor Replaced_Update) improves HNSW update efficiency and suppresses the growth of unreachable points during dynamic deletions/insertions (Xiao et al., 10 Jul 2024). Key mechanisms include restricted repair to mutual neighbors and backup dual-index management. MN-RU delivers 2–4× faster updates and stable recall under high-frequency modification workloads.
7. Applications and Practical Optimization
HNSW is the default ANN engine in Lucene (via Anserini), Weaviate, Milvus, FAISS, and other vector DBs (Ma et al., 2023, Elliott et al., 28 May 2024). In Lucene, careful tuning of , , strikes a trade-off between recall (MRR@10 ~ 0.38) and QPS (30+). Multi-threaded indexing and segment merging ("optimize") are essential for low-latency deployments. Sparse vector support and GPU variants (SONG) further accelerate HNSW in large-scale EBR (embedding-based retrieval) (Li et al., 2023).
Memory-access optimization: Post-construction graph reordering (Gorder, RCM) reorganizes memory layout, reducing cache misses and query time by up to 40% at high recall (Coleman et al., 2021). Reordering is highly recommended for large, static indices.
References
- (Malkov et al., 2016): Malkov & Yashunin, Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
- (Indyk et al., 2023): Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations
- (Munyampirwa et al., 2 Dec 2024): Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs"
- (Nguyen et al., 23 Jan 2025): Dual-Branch HNSW Approach with Skip Bridges and LID-Driven Optimization
- (Liu et al., 17 May 2025): Efficient Vector Search on Disaggregated Memory with d-HNSW
- (Sehgal et al., 29 Jun 2025): NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance
- (Xiao et al., 10 Jul 2024): Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation
- (Ma et al., 2023): Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes
- (Li et al., 2023): Practice with Graph-based ANN Algorithms on Sparse Data: Chi-square Two-tower model, HNSW, Sign Cauchy Projections
- (Coleman et al., 2021): Graph Reordering for Cache-Efficient Near Neighbor Search
- (Elliott et al., 28 May 2024): The Impacts of Data, Ordering, and Intrinsic Dimensionality on Recall in Hierarchical Navigable Small Worlds