Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dual-Branch HNSW++

Updated 9 February 2026
  • The paper introduces a dual-branch ANN search architecture that interleaves two layered proximity graphs to overcome cluster disconnection and suboptimal greedy search.
  • HNSW++ employs LID-driven layer and branch assignments, yielding a recall boost of up to 18% and approximately 20% reduction in build time.
  • The skip-bridge mechanism efficiently bypasses intermediate layers under specific conditions, preserving logarithmic search complexity while accelerating query times in high-dimensional data.

The Dual-Branch HNSW (commonly referenced as HNSW++ in the literature) approach augments the standard Hierarchical Navigable Small World (HNSW) algorithm for approximate nearest neighbor (ANN) search by employing a dual structure of interleaved proximity graphs, Local Intrinsic Dimensionality (LID)-driven optimization, and a skip-bridge mechanism. This architecture addresses limitations of standard HNSW in cluster disconnection, suboptimal greedy search, and scalability concerns, particularly for high-dimensional datasets. HNSW++ achieves higher recall, faster index construction, and preserves logarithmic search complexity via these innovations (Nguyen et al., 23 Jan 2025).

1. Dual-Branch HNSW Architecture

HNSW++ partitions the data set into two interleaved "branches," each forming its own layered proximity graph. For each node qq, two assignments are determined: the graph layer q\ell_q (derived from normalized LID, see Section 3) and the branch label bq{0,1}b_q \in \{0,1\}, ensuring each layer in both branches contains approximately half of all nodes.

During index construction, layer and branch assignment proceeds in alternation over the sorted (descending) normalized LID values, distributing connectivity responsibilities. At query time, the ANN search launches from the highest layer in both branches, executing a parallel, branch-local greedy descent. Each branch accumulates candidate neighbors as it traverses downward; upon reaching layer 0, the candidates from both branches are exchanged (excluding duplicates) and merged, yielding the final top-kk neighbors by minimal distance.

Key motivations for this structure are:

  • Reducing the probability that greedy search trajectories in both branches encounter the same local optimum.
  • Improving outlier and disconnected component coverage.
  • Accelerating construction: per-branch insertions touch approximately half as many nodes, yielding a 20% empirical reduction in build time.
  • Preserving recall, since layer-0 merging restores global graph connectivity (Nguyen et al., 23 Jan 2025).

2. Skip-Bridge Mechanism

To mitigate the exhaustive traversal of graph layers inherent in standard HNSW, HNSW++ introduces the skip-bridge. Skip-bridges allow a search to bypass multiple intermediate layers and directly jump to layer 0 under specific conditions. For a branch at layer >0\ell>0, if the current entry point ee satisfies: LID(e)>T  and  d(e,xq)<ε,\mathrm{LID}(e) > T\ \ \text{and}\ \ d(e, x_q) < \varepsilon, the skip-bridge is triggered. Here, TT is a tunable LID threshold (e.g., normalized $0.8$), and ε\varepsilon is a distance threshold, typically the average inter-point distance.

This skip operation reduces the expected traversed layers from LmaxL_{\max} to

E[#layers]=Lmax(1Pskip),\mathbb{E}[\#\text{layers}] = L_{\max}(1 - P_{\rm skip}),

where PskipP_{\rm skip} denotes the empirical probability that the jump condition is met. The asymptotic O(logN)\mathcal O(\log N) search complexity is preserved, but query-time constants are reduced, with particular benefit in high-dimensional regimes (Nguyen et al., 23 Jan 2025).

3. Local Intrinsic Dimensionality (LID) & LID-Driven Insertion

LID, estimated for each point by maximum likelihood as

LID(x)=(1k1i=1k1lndkdi)1,\mathrm{LID}(x) = \left(\frac{1}{k-1} \sum_{i=1}^{k-1} \ln\frac{d_k}{d_i}\right)^{-1},

(where did_i is the distance to the iith nearest neighbor and dkd_k to the kkth), informs both graph layer and branch assignments. After normalization (min–max), nodes with higher LID values are interpreted as residing in sparser, more structurally critical regions. These nodes are preferentially assigned to higher layers to maximize inter-cluster connectivity and to reduce the risk that greedy search becomes confined within dense local regions.

Empirical ablations demonstrate that LID-driven insertion yields the single largest improvement in recall (+12–18 percentage points), indicating that optimizing connectivity via LID is pivotal for cluster bridging effectiveness and overall search accuracy (Nguyen et al., 23 Jan 2025).

4. Algorithmic Workflow

The HNSW++ pipeline integrates LID estimation with dual-branch layer assignment and search. Core workflow steps:

  • Layer and Branch Assignment: Sort data by descending normalized LID, alternately assign each node’s branch (b{0,1}b \in \{0,1\}), and sample its HNSW layer in typical fashion, capping at Lmax1L_{\max} - 1.
  • Insertion: For a node qq, greedy descent is performed in its assigned branch from the top layer down to just above the assigned layer, with links established according to standard HNSW neighbor selection.
  • Search: Simultaneous top-layer descent is performed in both branches (see pseudocode in source), with skip-bridge evaluations at each step. On reaching layer 0, candidate sets are exchanged to remove duplicates, then merged for final selection.

This dual traversal ensures complementary subgraph exploration, facilitating robust recall and mitigating the impact of local optima (Nguyen et al., 23 Jan 2025).

5. Theoretical Complexity and Empirical Performance

The complexity profile of HNSW++ closely mirrors standard HNSW:

  • Search: Each branch conducts O(logN)\mathcal O(\log N) traversals and heap operations. Due to skip-bridges, the expected layer traversal count is reduced by a factor determined by the skip probability, yet overall complexity remains O(logN)\mathcal O(\log N). Layer-0 merging incurs only O(klogk)\mathcal O(k\log k) cost.
  • Construction: Each insertion is of cost O(logN)\mathcal O(\log N) per branch; total build is O(NlogN)\mathcal O(N\log N). The halving of nodes per branch lowers construction time constants by roughly 2, with LID computation being a one-time O(N)\mathcal O(N) operation.

Empirical benchmarks demonstrate:

  • Recall@10 improvement of +18% for NLP (e.g., GloVe) and up to +30% for CV datasets (e.g., SIFT, GIST, DEEP).
  • Build time reduction of ≈20% with identical main parameters (MM, efconstructef_{\mathrm{construct}}).
  • Query throughput is nearly unchanged (within 2% of standard HNSW), despite the dual traversal.
  • For instance, on the GIST dataset (960D), recall@10 improves from 0.70 to 0.91, build time from 1.0s to 0.8s (Nguyen et al., 23 Jan 2025).

6. Ablation Studies

Comprehensive ablation across four variants (baseline HNSW, dual-branch only, LID-only, full HNSW++) indicates:

  • LID-driven layer assignment accounts for the highest recall boost (+12–18%).
  • Dual-branch structure is responsible for an additional +6–10% in recall and nearly halves build time.
  • Skip-bridges decrease query time by 5–10% with negligible effect on recall.
  • Combined, LID-driven assignment and dual-branch structure recover nearly the full recall observed for the complete method; skip-bridges compensate for query-time overhead introduced by dual branching (Nguyen et al., 23 Jan 2025).

7. Parameterization and Practical Usage

Operational guidelines include:

  • MM (maximum node links): standard HNSW values (16–64).
  • efconstructionef_{\mathrm{construction}}: 100–200 for high recall index generation.
  • efsearchef_{\mathrm{search}}: 50–150, tuned to latency constraints.
  • LID jump threshold TT: 0.7–0.9 after normalization, ideally selected to ensure 15–30% skip-bridge activation.
  • Distance threshold ε\varepsilon: typically set to the mean nearest-neighbor distance.

A caveat is noted for high-dimensional regimes (LID50\mathrm{LID} \gg 50) where LID normalization may compress the distribution; TT should be adjusted to sustain a non-negligible skip frequency. No accuracy-speed tradeoff is reported for HNSW++ within the tested parameter regimes (Nguyen et al., 23 Jan 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dual-Branch HNSW.