SPLADE-doc Variant in Sparse Retrieval

Updated 21 February 2026

The paper introduces DF-FLOPS regularization, enabling aggressive sparse control while maintaining retrieval quality in large-scale search.
It employs a Transformer-based encoder and log-saturated aggregation to precompute efficient, term-weighted document representations.
Empirical results demonstrate a 4.5× reduction in candidate matches and a 10× latency improvement compared to standard FLOPS.

SPLADE-doc is a learned sparse retrieval (LSR) model variant that eliminates the neural query encoder and focuses entirely on pre-computing document-side sparse representations. Designed to leverage inverted index infrastructure, SPLADE-doc produces term-weighted, log-saturated, and aggressively sparse vectors for documents, enabling high-throughput and low-latency retrieval comparable to classical bag-of-words models. Several key advances—including precise sparsity control, novel regularization techniques (notably DF-FLOPS), and representation thresholding—have extended its efficiency and practicality for large-scale, real-world search deployments.

1. SPLADE-doc Architectural Principles

SPLADE-doc inherits its core architecture from SPLADE, which couples a Transformer-based encoder (such as BERT or DistilBERT) with a masked language modeling (MLM)–style "expansion head." For each document $d$ , the encoder computes hidden vectors $h_i$ for each token position, then projects them into vocabulary space via: $w_{ij} = \text{Transform}(h_i)^\top E_j + b_j$ where $E_j$ is the embedding for term $j$ and $b_j$ is a learned bias. To promote sparsity, document term weights are aggregated as: $w_j^d = \max_{i \in d} \log(1 + \text{ReLU}(w_{ij}))$ Unlike the full SPLADE, SPLADE-doc omits any query-side encoding: at retrieval, the query is a simple bag-of-tokens, each assigned a unit weight, and document scores are summed over the corresponding precomputed $w_j^d$ (Formal et al., 2021, Formal et al., 2021, Lassance et al., 2022).

2. FLOPS and DF-FLOPS Regularization

Term-level sparsity is induced through the FLOPS regularizer, designed to minimize the average activation per term over the batch: $\ell_{\mathrm{FLOPS}} = \sum_{t \in V} \left( \frac{1}{N} \sum_{i=1}^N r_{i,t} \right)^2$ where $r_{i,t}$ is the weight for term $t$ in document $i$ . However, standard FLOPS only controls "per-document" sparsity and does not prevent certain terms (such as stopwords) from accruing high document frequency (DF), resulting in long posting lists and increased retrieval latency.

DF-FLOPS addresses this by introducing a DF-dependent penalty: $w_t = \text{activ}\left( \frac{DF_t}{|C|} ; \alpha, \beta \right)$

$\ell_{\mathrm{DF-FLOPS}} = \sum_{t \in V} \left( w_t \cdot \frac{1}{N} \sum_{i=1}^N r_{i,t} \right)^2$

The smooth cutoff function $\text{activ}(x;\alpha,\beta)$ , typically a generalized logistic, ensures that terms appearing in a large fraction of the corpus (e.g., DF $>$ 10%) are heavily penalized for high average activation. When $w_t \equiv 1$ , DF-FLOPS reduces to standard FLOPS (Porco et al., 21 May 2025).

3. Integrating DF-FLOPS in the SPLADE-doc Pipeline

DF-FLOPS regularization is seamlessly slotted into the standard SPLADE-doc training procedure. Documents are encoded through a DistilBERT-based encoder. The regularization strength $\lambda$ is quadratically ramped during a 50k-step pretraining phase. Every 100 validation steps, empirical DF estimates are refreshed using held-out mini-batch statistics. The per-term penalty $w_t$ is computed, and DF-FLOPS multiplies the gradient for each term by $w_t^2$ , entailing no additional backward passes over standard automatic differentiation. The overall loss is: $L = L_{\mathrm{rank}} + \lambda \cdot \ell_{\mathrm{DF-FLOPS}}$ This makes the approach a drop-in replacement for FLOPS in existing codebases and pipelines (Porco et al., 21 May 2025).

4. Sparsity, Latency, and Index Efficiency

Adopting DF-FLOPS regularization yields significant improvements in posting-list lengths and latency. Empirical results on 8.8M MS MARCO passages indexed in Apache Solr v9 show:

Model	Top-token DF%	Matches/query	Passage terms	Latency Avg (ms)	99th %ile (ms)
SPLADE-Doc + FLOPS	95.8%	8.63M	583.8	922	1945
SPLADE-Doc + DF-FLOPS	8.0%	1.91M	301.6	161	342
BM25	—	—	—	68.9	241.3

DF-FLOPS pushes the majority of high-frequency tokens below 10% document frequency, reducing the mean number of candidate documents per query ( $\approx 4.5\times$ fewer matches) and shrinking average passage vector length by nearly half. This slashes end-to-end retrieval latency by approximately $10\times$ (922ms $\rightarrow$ 161ms) and achieves p99 latency close to BM25, without wholesale stopword removal—critical terms with high DF can still be retained if salient (Porco et al., 21 May 2025).

5. Empirical Performance and Benchmarking

Table 1 from (Porco et al., 21 May 2025) summarizes in-domain effectiveness and speed on MS MARCO and TREC:

Model	MRR@10	Recall@1K	Latency Avg (ms)
BM25	18.4	85.3	68.9
SPLADE-Doc + FLOPS ( $\lambda \approx 1e-3$ )	32.2	92.4	922.0
+ pruning@150	32.0	92.1	792.1
FLOPS ( $\lambda=0.1$ )	29.2	88.8	331.6
FLOPS ( $\lambda=1.0$ )	28.3	88.4	160.9
DF-FLOPS ( $\alpha=0.1,\beta=10, \lambda\sim10$ )	30.0	92.9	161.0
+ pruning@150	29.7	93.0	87.8

DF-FLOPS achieves a recall equal to or slightly exceeding FLOPS, with a mean reciprocal rank (MRR@10) only marginally reduced (32.2 → 30.0). Applying modest pruning yields latencies competitive with BM25. On 13 BEIR zero-shot cross-domain tasks, DF-FLOPS outperforms standard FLOPS on 12/13 datasets and generally narrows the performance gap to the BM25 baseline (Porco et al., 21 May 2025).

6. Trade-offs, Tuning, and Production Guidelines

Regularization strength ( $\lambda$ ): Since $\ell_{\mathrm{DF-FLOPS}}$ is numerically smaller for low-DF terms, viable $\lambda$ can be $10\times$ – $1000\times$ larger than standard FLOPS without oversparsifying. Empirically, $\lambda \in [10^{-1},10^{3}]$ is robust.
DF cutoff ( $\alpha$ ) and steepness ( $\beta$ ): Setting $\alpha \approx 0.1$ (cutoff for terms $>$ 10% corpus DF), $\beta=10$ (sharp transition) effectively suppresses unwanted stopwords. Adjustments allow for aggressive or conservative DF control.
Effectiveness vs. latency: In-domain, a 2–4 point MRR@10 drop produces a 10 $\times$ latency reduction. Cross-domain, DF-FLOPS often improves robustness and generalization relative to FLOPS, likely by limiting overfitting to dataset-specific high-DF tokens.
Deployment recommendations: DF-FLOPS is recommended for all large-scale inverted-index deployments (e.g., Solr, Lucene) where p99 latency under 200ms is a requirement. For scenarios prioritizing maximum effectiveness (e.g., research benchmarks or two-stage reranking), standard FLOPS or relaxed regularization may be preferred (Porco et al., 21 May 2025).

Beyond DF-FLOPS, other mechanisms for efficiency include hybrid hard/soft thresholding on term weights (Qiao et al., 2023) and leveraging decoder-only LLMs with echo embeddings (as in Echo-Mistral-SPLADE (Doshi et al., 2024)). However, DF-FLOPS is distinctive in directly regularizing per-term posting list length via corpus-level statistics rather than relying on static stopword lists or global mask pruning. This raises system-wide gains in inverted index workload, without unduly sacrificing retrieval quality or losing the lexical specificity achievable with learned sparse retrieval.

SPLADE-doc and its DF-targeted regularization constitute a significant step in reconciling the demands of neural IR effectiveness with the strict latency regimes and index pressure of web-scale search. These advances underpin current state-of-the-art production LSR deployments, making the trade space between speed, index size, and search quality highly tunable for practical settings (Porco et al., 21 May 2025, Formal et al., 2021, Lassance et al., 2022).

Markdown Report Issue Upgrade to Chat

References (6)

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking (2021)

SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval (2021)

An Efficiency Study for SPLADE Models (2022)

An Alternative to FLOPS Regularization to Effectively Productionize SPLADE-Doc (2025)

Representation Sparsification with Hybrid Thresholding for Fast SPLADE-based Document Retrieval (2023)

Mistral-SPLADE: LLMs for better Learned Sparse Retrieval (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SPLADE-doc Variant.

SPLADE-doc Variant in Sparse Retrieval

1. SPLADE-doc Architectural Principles

2. FLOPS and DF-FLOPS Regularization

3. Integrating DF-FLOPS in the SPLADE-doc Pipeline

4. Sparsity, Latency, and Index Efficiency

5. Empirical Performance and Benchmarking

6. Trade-offs, Tuning, and Production Guidelines

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SPLADE-doc Variant in Sparse Retrieval

1. SPLADE-doc Architectural Principles

2. FLOPS and DF-FLOPS Regularization

3. Integrating DF-FLOPS in the SPLADE-doc Pipeline

4. Sparsity, Latency, and Index Efficiency

5. Empirical Performance and Benchmarking

6. Trade-offs, Tuning, and Production Guidelines

7. Advances and Related Variants

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research