SINDI: Sparse Inverted Non-Redundant Index

Updated 11 September 2025

SINDI is a high-efficiency indexing framework that integrates SIMD processing, windowed memory access, and mass-ratio pruning for fast approximate maximum inner product search.
Its design minimizes redundant distance computations and optimizes cache-friendly sequential access to significantly improve query throughput and latency.
SINDI is incorporated into Ant Group’s VSAG library, providing a scalable solution for retrieval-augmented generation and large-scale sparse search applications.

The Sparse Inverted Non-Redundant Distance Index (SINDI) is a high-efficiency indexing and retrieval framework for fast approximate maximum inner product search (MIPS) on high-dimensional sparse vectors. It addresses bottlenecks prevalent in production environments—namely redundant distance computations, frequent random memory accesses, and limited acceleration opportunities due to compressed storage formats. SINDI integrates batched SIMD-enabled inner product calculation, sequential window-based memory access design, and mass-based vector pruning to achieve significant improvements in query throughput and latency over traditional inverted index and graph-based systems. As of 2025, SINDI has been incorporated into Ant Group’s open-source vector search library VSAG, providing a scalable solution for retrieval-augmented generation and other sparse search applications (Li et al., 10 Sep 2025).

1. Efficient Inner Product Computation via SIMD

SINDI optimizes the computation of inner products between sparse vectors and queries by minimizing redundant operations and utilizing SIMD (Single Instruction Multiple Data) acceleration. In classical inverted index approaches, computing the inner product $\delta(\mathbf{x}, \mathbf{q}) = \sum_{j \in \Omega(\mathbf{x}, \mathbf{q})} x_j \cdot q_j$ requires identifying overlapping nonzero dimensions and subsequently performing multiplications. SINDI eliminates the unnecessary “identifier lookup” by storing both document identifier and feature value together within each posting in the inverted list for a given dimension.

For each nonzero query component $q_j$ , the corresponding inverted list provides all relevant document values $x_j$ in a contiguous block. Using SIMD instructions, SINDI multiplies $q_j$ by a vector of $x_j$ entries simultaneously, storing results directly into a temporary product array $T^j$ . This approach amortizes the computation cost, reducing it from the entry-wise $O(\|\mathbf{q}\| + \|\mathbf{x}\|)$ to approximately $\Theta(\|\mathbf{q}\| / s)$ , where $s$ is the SIMD width. Theorem 1 from (Li et al., 10 Sep 2025) formally establishes that batch-wise processing via SIMD is feasible due to this unified inverted list structure.

This method avoids repeated identifier matching and enables full utilization of processor vectorization capabilities, yielding high throughput for large-scale MIPS tasks.

2. Memory-Friendly Windowed Index Design

Random memory access is a primary latency source in sparse vector retrieval, especially when posting lists are distributed across the entire dataset. SINDI introduces a “Window Switch” strategy that partitions both the dataset and the inverted lists into contiguous windows of size $\lambda$ . During query processing, each window covers a fixed range of document IDs, and the main distance-accumulation array $A$ only spans $\lambda$ entries.

Sequential processing of postings within each inverted list window minimizes cache misses and avoids the scattered access patterns characteristic of conventional approaches. Documents are indexed and accessed by modularizing their global identifiers, ensuring local aggregation of score updates.

The effectiveness of this design is empirically supported by performance diagrams showing reduced memory-bound latency as $\lambda$ decreases, resulting in sequential rather than random access patterns. This method enables stable, predictable speedups in multi-threaded and large-batch environments.

3. Mass Ratio Vector Pruning

Vector pruning is crucial for increasing efficiency without sacrificing retrieval accuracy. SINDI employs mass-based pruning, specifically the “Mass Ratio Pruning” (MRP) technique. For a given vector $\mathbf{x}$ , the total mass is defined as $\xi(\mathbf{x}) = \sum_j |x_j|$ . SINDI retains only the smallest prefix of dimensions sorted by $|x_j|$ that sum to at least an $\alpha$ fraction of total mass (i.e., the $\alpha$ -mass subvector).

This approach preserves key semantic features, as learned sparse models (e.g., Splade) concentrate most relevant information in a small set of high-magnitude coordinates. Empirical results demonstrate that retaining only 16–30% of nonzero entries yields negligible error in inner product estimation, while query throughput improves noticeably.

Multiple pruning strategies are discussed, though MRP consistently provides the best trade-off between efficiency and accuracy. This pruning is applied at index build time and during query execution, ensuring that inverted lists remain short and computation focuses on highly informative dimensions.

4. Comparative Performance Evaluation

SINDI is evaluated on multiple real-world benchmarks including MsMarco, across varying dataset scales and languages. When Recall@50 exceeds 99%, SINDI attains single-thread query-per-second (QPS) gains of 4.2×–26.4× over systems such as Seismic (Bruch et al., 29 Apr 2024) and PyANNs. These improvements are attributed directly to batched SIMD computation, windowed memory access, and vector pruning.

Additional experimental results indicate lower index construction time and reduced memory footprints compared with both plain inverted index and graph-based approximate nearest neighbor methods. SINDI maintains high retrieval accuracy (recall) under aggressive vector pruning settings, validating the non-redundancy of its index structure. Aggregate QPS and latency figures are reported in tabular form in the source paper, confirming improved scalability and suitability for production deployment.

5. Integration into Production Systems and Open-Source Libraries

SINDI has been incorporated into Ant Group’s VSAG, a scalable and modular vector search library. This integration makes SIMD-accelerated sparse search, windowed index processing, and mass-ratio pruning available to the broader development and research community. Practical benefits include support for large-scale retrieval-augmented generation (RAG) pipelines, semantic document search, and other multi-path retrieval tasks.

The VSAG integration exposes robust API hooks for customizing index building parameters and mass-ratio thresholds. Out-of-the-box defaults leverage SINDI’s design for efficient retrieval across millions of high-dimensional sparse vectors, ensuring low latency and resource overhead.

A plausible implication is that this availability in a widely-used open-source library could accelerate adoption of high-efficiency sparse retrieval by both industry and academia.

6. Context and Methodological Relationships

The principles behind SINDI build on several lines of research in sparse vector retrieval, including efficient dot-product based indexing (Wang et al., 2010), memory-aware block-based designs (Bruch et al., 29 Apr 2024), and classic direct approaches to high-dimensional nearest neighbor search (0810.4188). SINDI stands apart by integrating non-redundant index organization, batched computation, and memory optimizations, which are only partially realized in earlier systems.

Its focus on non-redundancy means SINDI avoids superfluous candidate evaluations by focusing only on the most statistically informative vector dimensions, a goal similarly pursued in boosting-based metric learning (Ma et al., 2015) and grouped subregion pruning in large-scale ANN indices (Baranchuk et al., 2018).

SINDI’s algorithmic advances—especially the elimination of redundant identifier lookups and cache-optimized processing—are supported by theoretical analysis of computational and memory cost, and by empirical benchmarks against state-of-the-art approximate nearest neighbor search systems.

7. Significance and Future Directions

The SINDI framework enables fast, accurate sparse vector MIPS with low computational and memory overhead. Its adoption in open-source infrastructure implies a broad impact for the deployment of retrieval-augmented models and high-throughput search engines. Future research directions may include extending SINDI for hybrid dense-sparse retrieval (Zhang et al., 27 Oct 2024), adapting block-based approximate filtering (Bruch et al., 29 Apr 2024), and incorporating learned metrics for further improved accuracy and non-redundancy (Ma et al., 2015).

Further comparisons to graph-based methods (Boytsov et al., 2019) as well as evaluation against increasingly large datasets will clarify SINDI’s position as a production-grade solution for fast approximate similarity search in high-dimensional sparse regimes.