Expanded-SPLADE: Enhanced Sparse Retrieval

Updated 3 December 2025

Expanded-SPLADE is a neural IR architecture that produces high-dimensional, sparse representations using vocabulary expansion and explicit sparsity regularization.
It leverages advanced pooling techniques and FLOPS/jFLOPS loss functions to balance retrieval effectiveness with computational efficiency.
The model integrates domain-adaptive pretraining and dynamic pruning strategies to support scalable, low-latency search across diverse datasets.

The Expanded-SPLADE model (often denoted as ESPLADE or simply SPLADE with vocabulary expansion) is a family of neural Information Retrieval (IR) architectures built on the principle of producing highly sparse, high-dimensional lexical embeddings for queries and documents, using neural term expansion and explicit sparsity regularization. It generalizes traditional SPLADE to larger and custom vocabularies and enhances retrieval quality while retaining the efficiency and interpretability of classic inverted-index-based search. Expanded-SPLADE underpins both domain-adaptive and web-scale IR systems, and includes variants based on encoder-only Transformers, decoder-only LLMs, and specialized initialization and pruning strategies (Formal et al., 2021, Formal et al., 2021, Iida et al., 2022, Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025, Doshi et al., 20 Aug 2024).

1. Model Architecture and Vocabulary Expansion

Expanded-SPLADE strictly adheres to BERT-based or LLM-based token encoding but alters both the vocabulary and the projection mechanism. From a text input $T = (t_1, \dots, t_n)$ , token embeddings $h_i$ are produced (via a Transformer backbone). Each $h_i$ is projected (linear→GeLU→LayerNorm or language-modeling head) onto an output vocabulary $U$ , which may be the BERT WordPiece vocabulary ( $|V_0| = 32\,001$ ) or an expanded set of $|U| = 100\,000$ custom unigrams.

For each $u \in U$ , token-wise logits $z_i[u]=W_u h_i + b_u$ are produced. To aggregate these into a sparse document or query vector $f_\theta(T)\in\mathbb{R}^{|U|}$ , Expanded-SPLADE applies ReLU (for nonnegativity), log-saturation (for numeric stability), and pooling: $f_\theta(T)[u] = \max_{i=1 \ldots n} \log\left(1 + \operatorname{ReLU}(z_i[u])\right)$ Pooling can be sum or max; max-pooling is empirically favored for expansion and efficacy (Formal et al., 2021, Kim et al., 20 Sep 2025).

The expanded vocabulary is constructed by extracting the $M$ most frequent unigrams from the target corpus. Initialization of the new 100k-output MLM head can be either:

EMLM/ESPLADE initialization: Mean pool the subword vectors for each unigram from the base WordPiece embeddings, and pretrain on large in-domain masked-LM corpora (Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025).
Random initialization: Initialize new $W$ , $b$ at random; this is empirically less effective under strict index-size constraints (Kim et al., 20 Sep 2025).

2. Sparse Regularization and Expansion Mechanisms

To maintain sparsity, Expanded-SPLADE implements both architectural and explicit loss-based mechanisms:

Log-saturation: The log(1+ReLU(·)) transform ensures heavy activations are damped and weak activations are suppressed, yielding self-inducing sparsity and stabilizing spurious expansions.
FLOPS loss (original and joint): For a set of batch encodings $f_\theta(T)$ , the original SPLADE applies a FLOPS penalty per view (query or document) via

$\mathcal{L}_{\mathrm{FLOPS}}(T) = \|\bar w^{(T)}\|^2 \ , \quad \bar w^{(T)} = \frac{1}{|T|}\sum_{x\in T} f_\theta(x)$

where $T$ denotes all queries or all documents in a batch (Formal et al., 2021, Formal et al., 2021). The ESPLADE variant introduces the joint-FLOPS loss:

$\mathcal{L}_{\mathrm{jFLOPS}}(Q, D) = \bar w^{(Q)} \cdot \bar w^{(D)}$

directly penalizing average pairwise overlap to further align sparsity between relevant query/document pairs (Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025).

Additional $\ell_1$ regularization can be used to encourage further sparsification but is often secondary to the FLOPS criterion (Formal et al., 2021). For LLM-based SPLADE (Mistral-SPLADE), a simple $L_1$ -style FLOPS penalty is used with ramp-up scheduling (Doshi et al., 20 Aug 2024).

3. Training Schemes

Expanded-SPLADE models support:

Masked-LM pretraining: For expanded-vocabulary models, pretraining on web titles or domain text is used to initialize new vocabulary rows, masking 15% of terms per instance with standard cross-entropy (Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025, Iida et al., 2022).
Contrastive ranking: Fine-tuning uses in-batch negatives and ranking losses, typically via cross-entropy softmax (in-batch N-pair) or InfoNCE with batch negatives (Formal et al., 2021, Formal et al., 2021, Doshi et al., 20 Aug 2024, Kim et al., 20 Sep 2025).
Distillation: SPLADE-max and downstream models leverage teacher–student setups (teacher: cross-encoder; student: SPLADE) with Margin-MSE targets to transfer cross-encoder discriminative ability (Formal et al., 2021).
Domain-adaptive pretraining: For unsupervised adaptation, AdaLM-style vocabulary expansion and continual MLM tuning are used to close domain gaps (Iida et al., 2022).

Combined objectives integrate the ranking loss, sparsity regularization (FLOPS/jFLOPS), and (optionally) distillation loss: $\mathcal{L} = \mathcal{L}_{\mathrm{rank}} + \lambda\,\mathcal{L}_{\mathrm{jFLOPS}}(Q,D)$

4. Pruning and Inference Efficiency

Efficient retrieval is supported by several static and dynamic pruning mechanisms:

Document-centric static pruning: Each document vector is truncated to its top- $k$ nonzero coordinates—reducing index size and traversal overhead (Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025).
Top- $k$ query term selection: At query time, only the $\ell$ largest-weight coordinates in the query vector participate in scoring (Won et al., 27 Nov 2025).
Boolean filtering with term-match threshold: Retrieval can be restricted to documents matching a fraction $\tau$ of active query terms (Won et al., 27 Nov 2025).
IDF reweighting: In domain-adaptive scenarios, document vectors are reweighted by domain-specific inverse document frequency, correcting for rare or newly-tokenized terms (Iida et al., 2022).

These mechanisms directly affect average FLOPS (i.e., expected posting-list accesses) and retrieval latency. Empirically, aggressive document (e.g., $k=10$ ), query ( $\ell=7$ ), and Boolean threshold ( $\tau\approx0.4$ ) pruning in expanded-sparse setups can achieve BM25-like latency while retaining >90% of Expanded-SPLADE's effectiveness improvement (Won et al., 27 Nov 2025).

5. Empirical Effectiveness and Efficiency

Expanded-SPLADE has been evaluated at web, billion-document, and out-of-domain scales. Key results:

On 20M-title corpora: ESPLADE (100K-vocab, no pruning) achieves MRR@10=0.2549 at FLOPS=0.0100, with further pruning (e.g., $q_k=5$ , $d_k=10$ ) reaching BM25 FLOPS levels (~0.0022) with only a minor drop in MRR@10 (0.2510) (Kim et al., 20 Sep 2025, Won et al., 27 Nov 2025).
On 9B-title sets: ESPLADE achieves SSS@10=0.7642 (semantic similarity) at 1.20s/query; with pruning ( $k=10$ , $\ell=7$ ), latency drops to 0.85s/query, SSS@10=0.7725 (Won et al., 27 Nov 2025).
Domain adaptation: With AdaLM-based expansion and IDF reweighting, SPLADE+CAI raises zero-shot nDCG@10 from 0.462 (original SPLADE) to 0.497 on BEIR out-of-distribution datasets (Iida et al., 2022).
LLM-based SPLADE (Mistral/Echo-Mistral-SPLADE) sets state-of-the-art BEIR zero-shot nDCG@10=55.07 versus dense (ColBERTv2, 49.95) and earlier ESPLADE (50.72), using a decoder-only architecture and aggressive pruning (Doshi et al., 20 Aug 2024).

A table summarizing key trade-offs appears below:

Model Variant	MRR@10 / nDCG@10	FLOPS (or s/query)	Key Setting/Pruning
SPLADE (32k, no prune)	0.2733	0.0217	MS MARCO titles
ESPLADE (100k, no prune)	0.2549	0.0100	20M titles
ESPLADE (pruned)	0.2510	0.0022	q=5, d=10; ≈BM25 FLOPS
SPLADE+CAI (domain)	0.497 (nDCG@10)	-	BEIR OOD, domain-adaptive
Echo-Mistral-SPLADE	0.5507 (nDCG@10)	-	BEIR-13 zero-shot
BM25	0.2030	0.0027	MS MARCO titles
ESPLADE (9B; prune)	0.7725 (SSS@10)	0.85s/query	k=10, $\ell=7$

6. Domain Adaptation and Vocabulary Dynamics

Expanded-SPLADE addresses domain shift via two mechanisms:

Vocabulary expansion explicitly augments the embedding/decoder to include newly relevant domain terms, initialized via subword decomposition and continued Masked-LM training. Simple expansion without MLM update is ineffective; combined vocabulary and pretraining is essential (Iida et al., 2022, Kim et al., 20 Sep 2025).
IDF reweighting and term frequency normalizes for distributional mismatches, recovering low-frequency/rare term importance and yielding gains on rare term-heavy queries.

This strategy mitigates the “vocabulary gap” and the “frequency gap” that plague neural IR models in low-resource or out-of-domain settings (Iida et al., 2022).

A plausible implication is that vocabulary size directly modulates the model's representational specificity; moving from 32k to 100k outputs enables fine-grained, low-latency retrieval, especially with static or aggressive pruning (Kim et al., 20 Sep 2025).

7. Variations and Future Directions

Recent works have extended Expanded-SPLADE in the following axes:

Decoder-only LLM (Mistral-SPLADE): Leverages the broader pretraining of large causal models and echo embeddings to yield higher-quality expansions. LoRA-based finetuning achieves state-of-the-art sparse retrieval, despite not using hard negatives or distillation (Doshi et al., 20 Aug 2024).
Conversational Sparse Retrieval (CoSPLADE): Applies SPLADE expansions to multi-turn conversational search, combining query and answer expansions into a context-aware sparse vector and integrating with reranking (Hai et al., 2023).
Static and dynamic pruning strategies: Enable end-to-end control over efficiency/effectiveness trade-offs for billion-scale or latency-sensitive deployment (Won et al., 27 Nov 2025).
Joint regularization (jFLOPS) and batchwise mask control: Enhance alignment between query and document sparsity, improving practical search overheads (Kim et al., 20 Sep 2025).

Expanded-SPLADE’s structured vocabulary growth and pruning define a new spectrum between neural semantic retrievers and classic BoW, supporting scalable and adaptive search system design.

References

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking (Formal et al., 2021)
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval (Formal et al., 2021)
Unsupervised Domain Adaptation for Sparse Retrieval by Filling Vocabulary and Word Frequency Gaps (Iida et al., 2022)
The Role of Vocabularies in Learning Sparse Representations for Ranking (Kim et al., 20 Sep 2025)
Efficiency and Effectiveness of SPLADE Models on Billion-Scale Web Document Title (Won et al., 27 Nov 2025)
Mistral-SPLADE: LLMs for better Learned Sparse Retrieval (Doshi et al., 20 Aug 2024)
CoSPLADE: Contextualizing SPLADE for Conversational Information Retrieval (Hai et al., 2023)