Papers
Topics
Authors
Recent
Search
2000 character limit reached

SPLADE-doc Variant in Sparse Retrieval

Updated 21 February 2026
  • The paper introduces DF-FLOPS regularization, enabling aggressive sparse control while maintaining retrieval quality in large-scale search.
  • It employs a Transformer-based encoder and log-saturated aggregation to precompute efficient, term-weighted document representations.
  • Empirical results demonstrate a 4.5× reduction in candidate matches and a 10× latency improvement compared to standard FLOPS.

SPLADE-doc is a learned sparse retrieval (LSR) model variant that eliminates the neural query encoder and focuses entirely on pre-computing document-side sparse representations. Designed to leverage inverted index infrastructure, SPLADE-doc produces term-weighted, log-saturated, and aggressively sparse vectors for documents, enabling high-throughput and low-latency retrieval comparable to classical bag-of-words models. Several key advances—including precise sparsity control, novel regularization techniques (notably DF-FLOPS), and representation thresholding—have extended its efficiency and practicality for large-scale, real-world search deployments.

1. SPLADE-doc Architectural Principles

SPLADE-doc inherits its core architecture from SPLADE, which couples a Transformer-based encoder (such as BERT or DistilBERT) with a masked language modeling (MLM)–style "expansion head." For each document dd, the encoder computes hidden vectors hih_i for each token position, then projects them into vocabulary space via: wij=Transform(hi)Ej+bjw_{ij} = \text{Transform}(h_i)^\top E_j + b_j where EjE_j is the embedding for term jj and bjb_j is a learned bias. To promote sparsity, document term weights are aggregated as: wjd=maxidlog(1+ReLU(wij))w_j^d = \max_{i \in d} \log(1 + \text{ReLU}(w_{ij})) Unlike the full SPLADE, SPLADE-doc omits any query-side encoding: at retrieval, the query is a simple bag-of-tokens, each assigned a unit weight, and document scores are summed over the corresponding precomputed wjdw_j^d (Formal et al., 2021, Formal et al., 2021, Lassance et al., 2022).

2. FLOPS and DF-FLOPS Regularization

Term-level sparsity is induced through the FLOPS regularizer, designed to minimize the average activation per term over the batch: FLOPS=tV(1Ni=1Nri,t)2\ell_{\mathrm{FLOPS}} = \sum_{t \in V} \left( \frac{1}{N} \sum_{i=1}^N r_{i,t} \right)^2 where ri,tr_{i,t} is the weight for term tt in document ii. However, standard FLOPS only controls "per-document" sparsity and does not prevent certain terms (such as stopwords) from accruing high document frequency (DF), resulting in long posting lists and increased retrieval latency.

DF-FLOPS addresses this by introducing a DF-dependent penalty: wt=activ(DFtC;α,β)w_t = \text{activ}\left( \frac{DF_t}{|C|} ; \alpha, \beta \right)

DFFLOPS=tV(wt1Ni=1Nri,t)2\ell_{\mathrm{DF-FLOPS}} = \sum_{t \in V} \left( w_t \cdot \frac{1}{N} \sum_{i=1}^N r_{i,t} \right)^2

The smooth cutoff function activ(x;α,β)\text{activ}(x;\alpha,\beta), typically a generalized logistic, ensures that terms appearing in a large fraction of the corpus (e.g., DF >> 10%) are heavily penalized for high average activation. When wt1w_t \equiv 1, DF-FLOPS reduces to standard FLOPS (Porco et al., 21 May 2025).

3. Integrating DF-FLOPS in the SPLADE-doc Pipeline

DF-FLOPS regularization is seamlessly slotted into the standard SPLADE-doc training procedure. Documents are encoded through a DistilBERT-based encoder. The regularization strength λ\lambda is quadratically ramped during a 50k-step pretraining phase. Every 100 validation steps, empirical DF estimates are refreshed using held-out mini-batch statistics. The per-term penalty wtw_t is computed, and DF-FLOPS multiplies the gradient for each term by wt2w_t^2, entailing no additional backward passes over standard automatic differentiation. The overall loss is: L=Lrank+λDFFLOPSL = L_{\mathrm{rank}} + \lambda \cdot \ell_{\mathrm{DF-FLOPS}} This makes the approach a drop-in replacement for FLOPS in existing codebases and pipelines (Porco et al., 21 May 2025).

4. Sparsity, Latency, and Index Efficiency

Adopting DF-FLOPS regularization yields significant improvements in posting-list lengths and latency. Empirical results on 8.8M MS MARCO passages indexed in Apache Solr v9 show:

Model Top-token DF% Matches/query Passage terms Latency Avg (ms) 99th %ile (ms)
SPLADE-Doc + FLOPS 95.8% 8.63M 583.8 922 1945
SPLADE-Doc + DF-FLOPS 8.0% 1.91M 301.6 161 342
BM25 68.9 241.3

DF-FLOPS pushes the majority of high-frequency tokens below 10% document frequency, reducing the mean number of candidate documents per query (4.5×\approx 4.5\times fewer matches) and shrinking average passage vector length by nearly half. This slashes end-to-end retrieval latency by approximately 10×10\times (922ms \rightarrow 161ms) and achieves p99 latency close to BM25, without wholesale stopword removal—critical terms with high DF can still be retained if salient (Porco et al., 21 May 2025).

5. Empirical Performance and Benchmarking

Table 1 from (Porco et al., 21 May 2025) summarizes in-domain effectiveness and speed on MS MARCO and TREC:

Model MRR@10 Recall@1K Latency Avg (ms)
BM25 18.4 85.3 68.9
SPLADE-Doc + FLOPS (λ1e3\lambda \approx 1e-3) 32.2 92.4 922.0
+ pruning@150 32.0 92.1 792.1
FLOPS (λ=0.1\lambda=0.1) 29.2 88.8 331.6
FLOPS (λ=1.0\lambda=1.0) 28.3 88.4 160.9
DF-FLOPS (α=0.1,β=10,λ10\alpha=0.1,\beta=10, \lambda\sim10) 30.0 92.9 161.0
+ pruning@150 29.7 93.0 87.8

DF-FLOPS achieves a recall equal to or slightly exceeding FLOPS, with a mean reciprocal rank (MRR@10) only marginally reduced (32.2 → 30.0). Applying modest pruning yields latencies competitive with BM25. On 13 BEIR zero-shot cross-domain tasks, DF-FLOPS outperforms standard FLOPS on 12/13 datasets and generally narrows the performance gap to the BM25 baseline (Porco et al., 21 May 2025).

6. Trade-offs, Tuning, and Production Guidelines

  • Regularization strength (λ\lambda): Since DFFLOPS\ell_{\mathrm{DF-FLOPS}} is numerically smaller for low-DF terms, viable λ\lambda can be 10×10\times1000×1000\times larger than standard FLOPS without oversparsifying. Empirically, λ[101,103]\lambda \in [10^{-1},10^{3}] is robust.
  • DF cutoff (α\alpha) and steepness (β\beta): Setting α0.1\alpha \approx 0.1 (cutoff for terms >>10% corpus DF), β=10\beta=10 (sharp transition) effectively suppresses unwanted stopwords. Adjustments allow for aggressive or conservative DF control.
  • Effectiveness vs. latency: In-domain, a 2–4 point MRR@10 drop produces a 10×\times latency reduction. Cross-domain, DF-FLOPS often improves robustness and generalization relative to FLOPS, likely by limiting overfitting to dataset-specific high-DF tokens.
  • Deployment recommendations: DF-FLOPS is recommended for all large-scale inverted-index deployments (e.g., Solr, Lucene) where p99 latency under 200ms is a requirement. For scenarios prioritizing maximum effectiveness (e.g., research benchmarks or two-stage reranking), standard FLOPS or relaxed regularization may be preferred (Porco et al., 21 May 2025).

Beyond DF-FLOPS, other mechanisms for efficiency include hybrid hard/soft thresholding on term weights (Qiao et al., 2023) and leveraging decoder-only LLMs with echo embeddings (as in Echo-Mistral-SPLADE (Doshi et al., 2024)). However, DF-FLOPS is distinctive in directly regularizing per-term posting list length via corpus-level statistics rather than relying on static stopword lists or global mask pruning. This raises system-wide gains in inverted index workload, without unduly sacrificing retrieval quality or losing the lexical specificity achievable with learned sparse retrieval.

SPLADE-doc and its DF-targeted regularization constitute a significant step in reconciling the demands of neural IR effectiveness with the strict latency regimes and index pressure of web-scale search. These advances underpin current state-of-the-art production LSR deployments, making the trade space between speed, index size, and search quality highly tunable for practical settings (Porco et al., 21 May 2025, Formal et al., 2021, Lassance et al., 2022).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SPLADE-doc Variant.