Papers
Topics
Authors
Recent
2000 character limit reached

Expanded-SPLADE: Enhanced Sparse Retrieval

Updated 3 December 2025
  • Expanded-SPLADE is a neural IR architecture that produces high-dimensional, sparse representations using vocabulary expansion and explicit sparsity regularization.
  • It leverages advanced pooling techniques and FLOPS/jFLOPS loss functions to balance retrieval effectiveness with computational efficiency.
  • The model integrates domain-adaptive pretraining and dynamic pruning strategies to support scalable, low-latency search across diverse datasets.

The Expanded-SPLADE model (often denoted as ESPLADE or simply SPLADE with vocabulary expansion) is a family of neural Information Retrieval (IR) architectures built on the principle of producing highly sparse, high-dimensional lexical embeddings for queries and documents, using neural term expansion and explicit sparsity regularization. It generalizes traditional SPLADE to larger and custom vocabularies and enhances retrieval quality while retaining the efficiency and interpretability of classic inverted-index-based search. Expanded-SPLADE underpins both domain-adaptive and web-scale IR systems, and includes variants based on encoder-only Transformers, decoder-only LLMs, and specialized initialization and pruning strategies (Formal et al., 2021, Formal et al., 2021, Iida et al., 2022, Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025, Doshi et al., 20 Aug 2024).

1. Model Architecture and Vocabulary Expansion

Expanded-SPLADE strictly adheres to BERT-based or LLM-based token encoding but alters both the vocabulary and the projection mechanism. From a text input T=(t1,,tn)T = (t_1, \dots, t_n), token embeddings hih_i are produced (via a Transformer backbone). Each hih_i is projected (linear→GeLU→LayerNorm or language-modeling head) onto an output vocabulary UU, which may be the BERT WordPiece vocabulary (V0=32001|V_0| = 32\,001) or an expanded set of U=100000|U| = 100\,000 custom unigrams.

For each uUu \in U, token-wise logits zi[u]=Wuhi+buz_i[u]=W_u h_i + b_u are produced. To aggregate these into a sparse document or query vector fθ(T)RUf_\theta(T)\in\mathbb{R}^{|U|}, Expanded-SPLADE applies ReLU (for nonnegativity), log-saturation (for numeric stability), and pooling: fθ(T)[u]=maxi=1nlog(1+ReLU(zi[u]))f_\theta(T)[u] = \max_{i=1 \ldots n} \log\left(1 + \operatorname{ReLU}(z_i[u])\right) Pooling can be sum or max; max-pooling is empirically favored for expansion and efficacy (Formal et al., 2021, Kim et al., 20 Sep 2025).

The expanded vocabulary is constructed by extracting the MM most frequent unigrams from the target corpus. Initialization of the new 100k-output MLM head can be either:

  • EMLM/ESPLADE initialization: Mean pool the subword vectors for each unigram from the base WordPiece embeddings, and pretrain on large in-domain masked-LM corpora (Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025).
  • Random initialization: Initialize new WW, bb at random; this is empirically less effective under strict index-size constraints (Kim et al., 20 Sep 2025).

2. Sparse Regularization and Expansion Mechanisms

To maintain sparsity, Expanded-SPLADE implements both architectural and explicit loss-based mechanisms:

  • Log-saturation: The log(1+ReLU(·)) transform ensures heavy activations are damped and weak activations are suppressed, yielding self-inducing sparsity and stabilizing spurious expansions.
  • FLOPS loss (original and joint): For a set of batch encodings fθ(T)f_\theta(T), the original SPLADE applies a FLOPS penalty per view (query or document) via

LFLOPS(T)=wˉ(T)2 ,wˉ(T)=1TxTfθ(x)\mathcal{L}_{\mathrm{FLOPS}}(T) = \|\bar w^{(T)}\|^2 \ , \quad \bar w^{(T)} = \frac{1}{|T|}\sum_{x\in T} f_\theta(x)

where TT denotes all queries or all documents in a batch (Formal et al., 2021, Formal et al., 2021). The ESPLADE variant introduces the joint-FLOPS loss:

LjFLOPS(Q,D)=wˉ(Q)wˉ(D)\mathcal{L}_{\mathrm{jFLOPS}}(Q, D) = \bar w^{(Q)} \cdot \bar w^{(D)}

directly penalizing average pairwise overlap to further align sparsity between relevant query/document pairs (Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025).

Additional 1\ell_1 regularization can be used to encourage further sparsification but is often secondary to the FLOPS criterion (Formal et al., 2021). For LLM-based SPLADE (Mistral-SPLADE), a simple L1L_1-style FLOPS penalty is used with ramp-up scheduling (Doshi et al., 20 Aug 2024).

3. Training Schemes

Expanded-SPLADE models support:

Combined objectives integrate the ranking loss, sparsity regularization (FLOPS/jFLOPS), and (optionally) distillation loss: L=Lrank+λLjFLOPS(Q,D)\mathcal{L} = \mathcal{L}_{\mathrm{rank}} + \lambda\,\mathcal{L}_{\mathrm{jFLOPS}}(Q,D)

4. Pruning and Inference Efficiency

Efficient retrieval is supported by several static and dynamic pruning mechanisms:

  • Document-centric static pruning: Each document vector is truncated to its top-kk nonzero coordinates—reducing index size and traversal overhead (Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025).
  • Top-kk query term selection: At query time, only the \ell largest-weight coordinates in the query vector participate in scoring (Won et al., 27 Nov 2025).
  • Boolean filtering with term-match threshold: Retrieval can be restricted to documents matching a fraction τ\tau of active query terms (Won et al., 27 Nov 2025).
  • IDF reweighting: In domain-adaptive scenarios, document vectors are reweighted by domain-specific inverse document frequency, correcting for rare or newly-tokenized terms (Iida et al., 2022).

These mechanisms directly affect average FLOPS (i.e., expected posting-list accesses) and retrieval latency. Empirically, aggressive document (e.g., k=10k=10), query (=7\ell=7), and Boolean threshold (τ0.4\tau\approx0.4) pruning in expanded-sparse setups can achieve BM25-like latency while retaining >90% of Expanded-SPLADE's effectiveness improvement (Won et al., 27 Nov 2025).

5. Empirical Effectiveness and Efficiency

Expanded-SPLADE has been evaluated at web, billion-document, and out-of-domain scales. Key results:

  • On 20M-title corpora: ESPLADE (100K-vocab, no pruning) achieves MRR@10=0.2549 at FLOPS=0.0100, with further pruning (e.g., qk=5q_k=5, dk=10d_k=10) reaching BM25 FLOPS levels (~0.0022) with only a minor drop in MRR@10 (0.2510) (Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025).
  • On 9B-title sets: ESPLADE achieves SSS@10=0.7642 (semantic similarity) at 1.20s/query; with pruning (k=10k=10, =7\ell=7), latency drops to 0.85s/query, SSS@10=0.7725 (Won et al., 27 Nov 2025).
  • Domain adaptation: With AdaLM-based expansion and IDF reweighting, SPLADE+CAI raises zero-shot nDCG@10 from 0.462 (original SPLADE) to 0.497 on BEIR out-of-distribution datasets (Iida et al., 2022).
  • LLM-based SPLADE (Mistral/Echo-Mistral-SPLADE) sets state-of-the-art BEIR zero-shot nDCG@10=55.07 versus dense (ColBERTv2, 49.95) and earlier ESPLADE (50.72), using a decoder-only architecture and aggressive pruning (Doshi et al., 20 Aug 2024).

A table summarizing key trade-offs appears below:

Model Variant MRR@10 / nDCG@10 FLOPS (or s/query) Key Setting/Pruning
SPLADE (32k, no prune) 0.2733 0.0217 MS MARCO titles
ESPLADE (100k, no prune) 0.2549 0.0100 20M titles
ESPLADE (pruned) 0.2510 0.0022 q=5, d=10; ≈BM25 FLOPS
SPLADE+CAI (domain) 0.497 (nDCG@10) - BEIR OOD, domain-adaptive
Echo-Mistral-SPLADE 0.5507 (nDCG@10) - BEIR-13 zero-shot
BM25 0.2030 0.0027 MS MARCO titles
ESPLADE (9B; prune) 0.7725 (SSS@10) 0.85s/query k=10, =7\ell=7

6. Domain Adaptation and Vocabulary Dynamics

Expanded-SPLADE addresses domain shift via two mechanisms:

  • Vocabulary expansion explicitly augments the embedding/decoder to include newly relevant domain terms, initialized via subword decomposition and continued Masked-LM training. Simple expansion without MLM update is ineffective; combined vocabulary and pretraining is essential (Iida et al., 2022, Kim et al., 20 Sep 2025).
  • IDF reweighting and term frequency normalizes for distributional mismatches, recovering low-frequency/rare term importance and yielding gains on rare term-heavy queries.

This strategy mitigates the “vocabulary gap” and the “frequency gap” that plague neural IR models in low-resource or out-of-domain settings (Iida et al., 2022).

A plausible implication is that vocabulary size directly modulates the model's representational specificity; moving from 32k to 100k outputs enables fine-grained, low-latency retrieval, especially with static or aggressive pruning (Kim et al., 20 Sep 2025).

7. Variations and Future Directions

Recent works have extended Expanded-SPLADE in the following axes:

  • Decoder-only LLM (Mistral-SPLADE): Leverages the broader pretraining of large causal models and echo embeddings to yield higher-quality expansions. LoRA-based finetuning achieves state-of-the-art sparse retrieval, despite not using hard negatives or distillation (Doshi et al., 20 Aug 2024).
  • Conversational Sparse Retrieval (CoSPLADE): Applies SPLADE expansions to multi-turn conversational search, combining query and answer expansions into a context-aware sparse vector and integrating with reranking (Hai et al., 2023).
  • Static and dynamic pruning strategies: Enable end-to-end control over efficiency/effectiveness trade-offs for billion-scale or latency-sensitive deployment (Won et al., 27 Nov 2025).
  • Joint regularization (jFLOPS) and batchwise mask control: Enhance alignment between query and document sparsity, improving practical search overheads (Kim et al., 20 Sep 2025).

Expanded-SPLADE’s structured vocabulary growth and pruning define a new spectrum between neural semantic retrievers and classic BoW, supporting scalable and adaptive search system design.

References

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Expanded-SPLADE Model.