Sparse Retrieval Models

Updated 12 February 2026

Sparse retrieval models are IR architectures that encode queries and documents as high-dimensional sparse vectors using term weighting and expansion for enhanced interpretability and efficiency.
They leverage classical inverted-index infrastructures integrated with neural network methods to achieve rapid, scalable retrieval across diverse benchmarks.
Recent advances include pruning strategies, FLOPS regularization, and ensemble distillation, which together balance accuracy, latency, and efficient deployment.

Sparse retrieval models are a family of information retrieval (IR) architectures that encode queries and documents as sparse, high-dimensional vectors over a fixed vocabulary. These explicit, interpretable representations leverage the classical advantages of inverted-index storage, enabling efficient large-scale retrieval while incorporating semantic generalization capabilities from pretrained neural models. In contemporary research, sparse retrievers have evolved from classical term-matching schemes (e.g., BM25) to highly parameterized neural methods that learn to produce both re-weighted and expanded sparse term vectors, yielding state-of-the-art performance on both in-domain and zero-shot IR benchmarks.

1. Architectural Foundations and Sparse Representation

Sparse retrieval models encode queries and documents into non-negative vectors $x_q, x_d \in \mathbb{R}^{|V|}$ , where $|V|$ is the vocabulary size. Each dimension corresponds to the weight assigned to a term, permitting explicit interpretability and basis for efficient scoring. Major variants include symmetric (“siamese”) encoders (shared architecture for both queries and documents) and asymmetric or inference-free approaches (document-side encoding only, with queries handled by simple lookups or static rules).

Weights are typically obtained via transformer-based deep networks with an output MLP or masked language modeling (MLM) head, applied to each token’s contextual embedding. A sparsifying activation (usually ReLU followed by log-saturation) encourages zeros in non-informative coordinates:

$w_{i,j} = \text{transform}(h_i)^\top E_j + b_j, \quad w_j = \max_{i \in \text{tokens}} \log(1 + \text{ReLU}(w_{i,j}))$

where $h_i$ is a contextual token embedding, $E_j$ the vocabulary embedding, and $b_j$ a term bias (Formal et al., 2021, Formal et al., 2021).

Retrieval is executed by computing the inner product between query and document vectors:

$\text{score}(q, d) = \langle x_q, x_d \rangle = \sum_{i=1}^{|V|} x_q[i] x_d[i]$

Inverted-index infrastructures store only nonzero entries, supporting efficient posting-list–based retrieval (Nguyen et al., 2023).

2. Training Objectives, Regularization, and Expansion

Sparse retrieval relies on (a) ranking objectives and (b) sparsity-inducing regularization. Ranking is optimized by InfoNCE contrastive objectives, margin-based losses, or distillation losses matching the teacher (often a cross-encoder or strong retriever):

$\mathcal{L}_{\text{rank}} = - \sum_{i} \log \frac{e^{s(q_i,d_i^+)}}{e^{s(q_i,d_i^+)} + \sum_{j\neq i} e^{s(q_i,d_j^-)}}$

Regularization employs explicit $\ell_1$ penalties or, more commonly, the FLOPS penalty, which penalizes the expected number of multiply–adds per query:

$\mathcal{L}_{\text{FLOPS}} = \sum_{j=1}^{|V|} \left( \frac{1}{N} \sum_{i=1}^N w_j^{(i)} \right)^2$

where $w_j^{(i)}$ is the weight of term $j$ in vector $i$ (Formal et al., 2021). Top- $k$ pooling and $\ell_0$ -inspired masked losses provide additional sparsification mechanisms (Shen et al., 21 Apr 2025).

Modern sparse models perform lexical “expansion”: tokens are mapped not just to their original surface forms but also to semantically related terms via the output head, enabling match beyond literal overlap (Doshi et al., 2024, Formal et al., 2021). Expansion contributions are learned per token and can target either queries, documents, or both; inclusion of both leads to a cancellation effect, saturating the marginal benefit (Nguyen et al., 2023).

3. Families of Sparse Retrieval Models

A representative set of model classes includes:

Model Family	Key Mechanism	Notable Implementations
Lexical Reweight	Term reweight only	uniCOIL, DeepImpact, Sparta
Expansion	Lexical + expansion	SPLADE, Echo-Mistral-SPLADE
Doc-only Asymmetry	Inference-free retrieval	Li-LSR, SPLADE-doc-distill
LLM-based	Decoder-only architectures	Mistral-SPLADE, PROSPER
Multimodal Sparse	Cross-modal projections	BLIP-LSR, Prob. Exp. Control

Early neural models such as DeepCT and uniCOIL focus on token-specific re-weighting without expansion, providing moderate improvements over BM25. Expansion frameworks, most notably SPLADE family models, generalize by mapping inputs to the full vocabulary space, resulting in richer, context-dependent expansion tokens (Formal et al., 2021, Thakur et al., 2023, Doshi et al., 2024).

LLM-based models utilize large decoder-only architectures (e.g., Mistral-7B), harnessing massive pretraining corpora to learn more meaningful expansions and outperform previous encoder-based strategies on BEIR and MSMARCO (Doshi et al., 2024).

4. Efficiency, Indexing, and Retrieval

Sparse models are designed for compatibility with classical inverted indexes (Lucene, Pyserini, PISA, OpenSearch). After encoding, only nonzero coordinates are stored per document, leading to highly compressed indices relative to dense embedding–based retrievals (typically 5–10 $\times$ smaller) (Song et al., 21 Oct 2025).

Query encoding remains a major bottleneck in symmetric models. Inference-free approaches (documents encoded offline, queries mapped via static lookup or light-weight scoring) reduce per-query latency to sub-millisecond at production scale (Nardini et al., 30 Apr 2025, Geng et al., 2024). Advancements in $\ell_0$ -masking and explicit thresholding have further closed the latency–relevance gap, enabling sub-10 ms query times with state-of-the-art quality (Shen et al., 21 Apr 2025).

Late-interaction models such as SPLATE integrate sparse candidate generation with a secondary MaxSim re-ranking step, balancing recall, latency, and CPU-only deployability (Formal et al., 2024).

5. Evaluation and Empirical Performance

Evaluations are performed primarily on MS MARCO (in-domain) using MRR@10 and the BEIR benchmark (zero-shot) using nDCG@10. Recent models achieve competitive or superior performance relative to dense ANN retrieval and cross-encoder rerankers:

SPLADE v2: MRR@10 ≈ 0.34 (MS MARCO), nDCG@10 = 0.47 (BEIR) (Formal et al., 2021, Thakur et al., 2023).
Echo-Mistral-SPLADE: nDCG@10 = 0.5507 (BEIR average), outperforming strong dense and previous sparse baselines (Doshi et al., 2024).
Inference-free models (Li-LSR, $\ell_0$ -mask): nDCG@10 ≈ 0.50 (BEIR), closing the gap to supervised siamese sparse retrievers (Nardini et al., 30 Apr 2025, Shen et al., 21 Apr 2025, Geng et al., 2024).
Multimodal LSR: Sparse projections from frozen VLP models with expansion control rival or surpass dense vision-language retrievers on MSCOCO and Flickr30k (Song et al., 22 Aug 2025, Nguyen et al., 2024).

Ablation studies indicate critical factors for effectiveness: document term weighting is indispensable, query weighting has modest value, and dual expansion brings diminishing returns (Nguyen et al., 2023). FLOPS regularization vs $\ell_1$ reveals smoother, more balanced index usage and superior Pareto efficiency (Formal et al., 2021, Formal et al., 2021).

6. Specialized Techniques and Recent Advances

Pragmatic retrieval: Rational Retrieval Acts introduce RSA-inspired dynamic token weighting, reweighting term-document pairs to reflect their contrastiveness in the collection, yielding statistically significant improvements for both neural and lexical baselines, particularly on out-of-domain benchmarks (Satouf et al., 6 May 2025).
Guided traversal: Index traversal led by a fast shallow model (BM25) prunes the postings evaluated by the slower neural model, resulting in 4 $\times$ end-to-end speedups with no loss of quality (Mallia et al., 2022).
Ensemble distillation: Heterogeneous knowledge distillation combines siamese dense and sparse teachers with IDF-aware penalization, giving inference-free retrievers relevance scores on par with siamese models and only 1.1 $\times$ BM25 latency (Geng et al., 2024).
Scaling laws: In decoder-only LLMs, scaling yields monotonic retrieval quality improvement only under contrastive loss; knowledge distillation alone shows little scaling effect. Combined CL+KD at scale achieves SOTA on MS MARCO, TREC DL, BEIR (Zeng et al., 21 Feb 2025).
Multimodal extensions: Joint optimization of dense and sparse branches, as well as probabilistic expansion control, allow adaptation of classical LSR to vision-language retrieval tasks with both interpretability and efficiency (Song et al., 22 Aug 2025, Nguyen et al., 2024).

7. Practical Considerations and Best Practices

Always include document term weighting; expansion should be applied to either documents or queries, not both, to avoid redundancy.
In latency-sensitive deployments, inference-free or asymmetric sparse architectures (e.g., Li-LSR, SPLADE-doc) provide optimal throughput without a transformer-based query encoder (Nardini et al., 30 Apr 2025, Geng et al., 2024).
FLOPS-style regularization outperforms naïve $\ell_1$ when aiming for smooth trade-off between retrieval quality and efficiency (Formal et al., 2021, Formal et al., 2021).
Integration with traditional indexers is straightforward; query-adaptive or block-max traversal techniques further enhance retrieval speed (Mallia et al., 2022).
LLM-based decoders (e.g., Mistral, Llama-3) with tied output embeddings and echo tricks yield improved context-sensitive expansions and unlock higher zero-shot robustness (Doshi et al., 2024, Zeng et al., 21 Feb 2025).
Pragmatic reweighting and self-distillation improve the discriminative power and generalization of neural sparse retrievers (Satouf et al., 6 May 2025).

Sparse retrieval models thus bridge classical IR efficiency and neural semantic understanding, supporting both web- and enterprise-scale retrieval with state-of-the-art accuracy and tractable computational footprints. Continued progress is driven by innovations in expansion, regularization, cross-modal adaptation, and deployment-aware training, positioning sparse models at the core of modern retrieval infrastructure.

Markdown Upgrade to Chat

References (15)

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking (2021)

SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval (2021)

A Unified Framework for Learned Sparse Retrieval (2023)

Exploring $\ell_0$ Sparsification for Inference-free Sparse Retrievers (2025)

Mistral-SPLADE: LLMs for better Learned Sparse Retrieval (2024)

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval (2023)

LLMs as Sparse Retrievers:A Framework for First-Stage Product Search (2025)

Effective Inference-Free Retrieval for Learned Sparse Representations (2025)

Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers (2024)

10.

SPLATE: Sparse Late Interaction Retrieval (2024)

11.

Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval (2025)

12.

Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control (2024)

13.

Rational Retrieval Acts: Leveraging Pragmatic Reasoning to Improve Sparse Retrieval (2025)

14.

Faster Learned Sparse Retrieval with Guided Traversal (2022)

15.

Scaling Sparse and Dense Retrieval in Decoder-Only LLMs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparse Retrieval Model.

Sparse Retrieval Models

1. Architectural Foundations and Sparse Representation

2. Training Objectives, Regularization, and Expansion

3. Families of Sparse Retrieval Models

4. Efficiency, Indexing, and Retrieval

5. Evaluation and Empirical Performance

6. Specialized Techniques and Recent Advances

7. Practical Considerations and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Sparse Retrieval Models

1. Architectural Foundations and Sparse Representation

2. Training Objectives, Regularization, and Expansion

3. Families of Sparse Retrieval Models

4. Efficiency, Indexing, and Retrieval

5. Evaluation and Empirical Performance

6. Specialized Techniques and Recent Advances

7. Practical Considerations and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research