Slipstream: Locality-Aware Graph Index Construction for Streaming Approximate Nearest Neighbor Search

Published 2 Jun 2026 in cs.IR | (2606.02992v1)

Abstract: Graph indexes are widely used for high-recall approximate nearest neighbor search (ANNS), but many real-time applications require streaming ANNS. In these real-time applications, continuously arriving embeddings must search the existing graph for candidate neighbors before updating graph edges, which makes repeated index construction a bottleneck for streaming ingestion workloads. We propose Slipstream, a new method that significantly reduces the computational cost of frequent insertions in graph indexes for ANNS. The core idea of Slipstream is exploiting the continuity in vector streams: the newly arrived point starts from promising candidates found during the previous insertion rather than searching from the entry point. More technically, Slipstream evaluates distinct subsets of starting candidates followed by an adaptive controller that narrows or widens the range according to the stream's stability. We further show that Slipstream is beyond heuristic: We derive an abstract model to characterize Slipstream's performance and analyze its theoretical bounds. We have implemented Slipstream in two popular open-source libraries (Faiss, HNSWLib) and compared it with four baseline methods on five streaming vector datasets. Experimental results show that Slipstream achieves up to 30.8$\times$ higher end-to-end throughput than baselines while maintaining at least 0.95 recall@10.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper's main contribution is Slipstream, which leverages temporal locality in streaming embeddings to significantly reduce insertion costs for graph-based ANNS.
Slipstream employs a proximity ratio and an adaptive controller to warm-start candidate searches, maintaining high recall while dramatically increasing throughput.
Empirical evaluations on diverse video workloads show up to 30.8× faster insertion rates with negligible recall degradation, confirming its efficiency and scalability.

Slipstream: Locality-Aware Graph Index Construction for Streaming Approximate Nearest Neighbor Search

Introduction and Motivation

Continuous, ingestion-heavy vector streams—ubiquitous in domains such as video analytics, RAG, AI agent memory, and real-time recommendation—pose stringent efficiency demands on graph-based ANNS index construction. Existing methods, especially HNSW and its variants, incur high computational cost per insertion due to repeated neighbor searches from the global entry point. However, empirical analysis of embedding streams (notably in video data) reveals strong temporal locality, with consecutive embeddings residing in adjacent or even overlapping regions of the metric space.

Figure 1: In index maintenance-intensive streaming workloads, the majority of online time is spent on repeated insertion; direct empirical analysis shows high temporal locality of consecutive embedding vectors.

This observation motivates Slipstream, a method designed to exploit stream continuity for amortizing insertion cost. By caching and adaptively reusing promising candidate vertices from recent insertions, Slipstream introduces significant computational reductions in graph construction, all while retaining state-of-the-art query recall. The method departs from prior work by reframing insertion initialization: rather than restarting from a static or randomly chosen entry point, insertion proceeds from a dynamically maintained, locality-informed seed set, controlled by a statistical proximity ratio and an adaptive controller.

Methodology: Slipstream Index Construction

Pipeline Overview

Slipstream integrates with any graph-ANNS insertion pipeline that contains a candidate search phase (e.g., HNSW). The key innovation is the replacement of stateless, independent search initializations with a reuse mechanism underpinned by a locality-sensitive criterion. The insertion pipeline is restructured as follows:

Segmentation: Incoming batches are partitioned at discontinuous jumps in stream locality.
Cache Maintenance: Each segment maintains a cache containing the anchor point, candidate set, selected neighbors, and local scale of proximity.
Proximity Evaluation: Upon each new insertion, the proximity ratio $\lambda_t$ (distance between consecutive points, normalized by local scale) is computed to determine whether to reuse the cache.
Adaptive Control: If proximity remains within threshold, insertion warm-starts with reduced beam width; otherwise, fallback triggers a full, vanilla insertion.
Controller Update: The insertion beam width for warm-starts is tuned online by an adaptive controller based on measured drift in stream locality.
Figure 2: Slipstream’s insertion pipeline, leveraging cached candidates from prior insertions and an adaptive controller responding to real-time stream drift.

Notably, all modifications are confined to the dense, cost-dominant bottom layer (layer 0) of HNSW, ensuring orthogonality with hierarchical routing and degree-riding mechanisms in upper layers.

Proximity Ratio and Fallback Policy

Slipstream's safety-critical criterion for cache reuse is the proximity ratio:

$\lambda_t = \frac{d(p_t, p_{t-1})}{\bar d_0(p_{t-1})}$

where $d(\cdot,\cdot)$ is the distance metric and $\bar d_0(\cdot)$ is the mean distance to local neighborhood. When $\lambda_t \leq R$ (with $R$ derived analytically from stream statistics), the cached candidates are deemed reliable; otherwise, the fallback discards the cache, and the insertion follows standard HNSW logic. The analytical underpinnings leverage an Erlang tail model for adaptive calibration of $R$ .

Adaptive Controller

To further optimize insertion effort, an adaptive controller operates whenever cache reuse is permissible. The controller contracts or expands the insertion beamwidth (number of concurrent candidates) based on observed stream drift. Tuning is regulated by escalation and contraction steps, with a threshold $T$ separating the "stable" (narrow) and "unstable" (wide) regions. The controller achieves a fine balance, ensuring robust recall while maximizing efficiency—converging to a predictable equilibrium governed by segment-level stream statistics.

Theoretical Guarantees

Slipstream's design is accompanied by a rigorous analytical framework:

Insertion Quality Bound: Candidate set recall at each insertion is lower-bounded by the candidate recall obtained under vanilla HNSW with the minimal beam width used by the controller.
Recall Floor: Aggregate graph-level query recall is tightly bounded below by that of the reference construction at the controller’s minimum width, with negligible degradation due to fallback mass.
Controller Equilibrium: Segment-averaged insertion width is shown empirically and theoretically to conform to a power-law response in the drift-balance and threshold parameters, yielding a closed-form calibration for iso-recall operation.
Figure 3: Empirical validation of the controller equilibrium model across five streaming video workloads, demonstrating precise prediction of segment-averaged insertion width from controller parameters.

Empirical Evaluation

A comprehensive set of experiments evaluates Slipstream on five diverse streaming embedding workloads: Kinetics, BDD100K, Epic-Kitchens, Ego4D, and VIRAT, all utilizing CLIP-encoded frame embeddings. Slipstream is integrated into both HNSWLib and Faiss and benchmarked against four baselines: HNSWLib-Vanilla, Faiss-Vanilla, Ada-ef, and DARTH.

Throughput–Recall Tradeoff

Slipstream achieves substantial improvements in streaming throughput, with no material sacrifice in recall@10:

Faiss backend: Up to 30.8 $\times$ faster insertion rates (ranging 40K–52K embeddings/s) at recall@10 $\geq$ 0.954, compared to the best-performing baseline.
HNSWLib backend: Speedups of up to 23.1 $\lambda_t = \frac{d(p_t, p_{t-1})}{\bar d_0(p_{t-1})}$ 0 observed.
The method strictly dominates baselines in the high-recall, high-throughput regime.
Figure 4: Streaming throughput versus recall@10 across all video workloads; Slipstream shifts the Pareto frontier upward, preserving high recall at significant speedup.

Runtime Decomposition

A breakdown of online latency confirms that Slipstream eliminates the insertion bottleneck—unlike Ada-ef and DARTH, whose adaptation mechanisms induce additional overhead via estimator recomputation and model retraining, respectively.

Figure 5: Operation time composition: Slipstream’s online runtime is dominated by efficient insertion, whereas adaptation-centric baselines pay substantial maintenance overheads.

Sensitivity and Ablations

Parameter sweeps confirm the stability of Slipstream with respect to its core knobs: the choice of initial and minimum beamwidths, as well as the proximity and escalation thresholds, yields plateaued performance surfaces rather than narrow optima.

Figure 6: Streaming throughput and recall@10 are robust to wide ranges of Slipstream parameterization, consistent across all streaming workloads.

Ablation analysis identifies the fallback mechanism as the single most impactful contributor to throughput, followed by further gains from the adaptive controller. Memory analysis shows that Slipstream does not incur additional persistent space overhead.

Implications and Future Directions

The theoretical and empirical results in this work mandate a reconsideration of index construction in streaming scenarios. By exposing and capitalizing on temporal locality, Slipstream fundamentally changes the cost structure of online ANNS. This is directly applicable to RAG systems, long-form video analysis, adaptive agent memory, and real-time personalized search—settings where high insertion rates under high recall constraints are required and streams are locally structured. Slipstream’s minimal intrusive design enables compatibility with existing and future hierarchical graph-based indexes, facilitating rapid adoption in production vector databases.

Future research directions include generalizing locality-adaptive cache reuse to distributed or disk-based graph indexes, integration with active learning for automatic stream drift detection, and deep co-design with vector quantization/partitioning methods to further amplify throughput gains in ultra-large scale settings.

Conclusion

Slipstream introduces a novel paradigm in streaming ANNS index construction by systematically exploiting stream locality through proximity-aware cache reuse and adaptive control of insertion effort. It is analytically grounded, empirically validated across diverse workloads, and achieves up to an order-of-magnitude increase in streaming throughput at fixed recall—without requiring additional memory or sacrificing retrieval fidelity. This establishes a new standard for efficient, locality-aware streaming graph index construction for high-throughput, real-time vector search scenarios.

Markdown Report Issue