Exp4Fuse: Dual-Route Rank Fusion
- Exp4Fuse is a dual-route framework that fuses original and LLM-augmented queries to enhance sparse retrieval performance.
- Its design integrates a modified reciprocal rank fusion algorithm that balances semantic expansion with lexical precision.
- Benchmark results show significant improvements over existing methods while reducing computational costs.
Exp4Fuse is a two-stage, dual-route rank fusion framework designed to enhance the effectiveness of sparse information retrieval systems by integrating zero-shot LLM–based query expansion in a computationally efficient manner. The method addresses key limitations of previous LLM-based query expansion techniques for sparse retrievers, such as complex prompting and reliance on dense re-ranking, by leveraging a fusion architecture that maximizes both semantic coverage and lexical precision without requiring multiple dense or multi-stage retrieval passes. Exp4Fuse has demonstrated state-of-the-art (SOTA) performance on several benchmarks when augmenting modern learned sparse retrievers, particularly SPLADE++ and uniCOIL variants, and consistently surpasses previous LLM-based query expansion baselines in both effectiveness and cost-efficiency (Liu et al., 5 Jun 2025).
1. Dual-Route Fusion Architecture
Exp4Fuse operates via a two-stage retrieval and fusion paradigm:
- In Stage 1, two independent retrieval routes are established based on a user query :
- Original Query Route (OQ): The unaltered query is fed to a sparse retriever, yielding ranked list .
- LLM-Augmented Query Route (EQ): A zero-shot LLM prompt (“Please write a passage to answer the question. .”) generates a hypothetical document . To balance document and query lengths, the query is repeated times and concatenated with : . The expanded query is submitted to the same sparse retriever, obtaining .
- In Stage 2, the outputs and are merged by a modified reciprocal rank fusion (RRF) algorithm that integrates list-specific weights and penalizes list-only appearance, yielding the final fused ranking .
This dual-route structure preserves the high recall of learned sparse retrievers on exact-match queries while adding semantic breadth through LLM-augmented keyword expansion.
2. Modified Reciprocal Rank Fusion Methodology
Exp4Fuse introduces a variant of reciprocal rank fusion that incorporates both query-route weighting and presence counts:
- For each document , let and be the ranks in and respectively, and the number of lists () in which appears.
- The fused score is:
- Default weights are . This heuristic boosts documents retrieved by both routes, penalizes unique-only appearances, and smooths score contributions, stabilizing fusion across different sparse retrievers (Liu et al., 5 Jun 2025).
3. Zero-Shot LLM-Based Query Expansion Protocol
Exp4Fuse employs a lightweight, zero-shot query expansion pipeline:
- The LLM (e.g., GPT4-mini), using a single-turn prompt with moderate temperature (0.6) and top-p (0.9), is queried to produce , capped at 128 tokens.
- To avoid query dilution by verbose LLM generations, the original query is repeated times and prepended to for final expansion .
- Only a single LLM call is required per query, drastically reducing inference cost compared to few-shot or cascade prompting, and avoiding the need for multiple sparse or dense indices.
4. Implementation, Hyperparameters, and Efficiency
Exp4Fuse supports popular sparse retrievers: BM25, uniCOIL, SLIM, SPLADEv2, and their optimized “++” variants.
- Pipeline: The original query and LLM-generated expansion are submitted in parallel, each retrieving the top-1000 documents. Their rankings are then fused with the modified RRF.
- Key parameters: (RRF), , , LLM temperature 0.6, top-p 0.9, max length 128 tokens.
- Resource profile: The framework requires only a single sparse index and one LLM call per query. The computational cost is roughly a single sparse retrieval pass plus LLM generation. This is markedly lower than dense-sparse fusion, dense re-ranking, or multi-stage pipelines (Liu et al., 5 Jun 2025).
5. Experimental Benchmarks and Comparative Evaluation
Exp4Fuse was benchmarked on three MS MARCO-related in-domain datasets (MS MARCO dev, TREC DL 2019, TREC DL 2020) and seven BEIR out-of-domain collections (DBPedia, FiQA, TREC-NEWS, NQ, Robust04, Touche2020, SciFact).
- Baselines included: classic sparse (BM25, docT5query), learned sparse (uniCOIL, SLIM++, SPLADE++ v1/v2), LLM-augmented sparse (query2doc, LameR), dense (TAS-B, SimLM, Contriever+HyDE), and multi-stage models (monoT5, RepLLaMA+RankLLaMA, SPLADE+ColBERT).
- Metrics: MS MARCO (MRR@10, Recall@1000), TREC DL (MAP, nDCG@10), BEIR (nDCG@10).
| Dataset | Baseline | +Exp4Fuse Gain | SOTA Achievement |
|---|---|---|---|
| BM25 (MS MARCO) | MRR@10: baseline | +2.3 | |
| docT5query | nDCG@10: baseline | +1–7 | |
| SPLADE++ v1 | nDCG@10: 73.1→77.6 | +4.5 | SOTA on TREC DL 2019 |
| BEIR (various) | nDCG@10: | +4.3 to +8.5 |
Exp4Fuse yields large and stable gains across all datasets and baselines, outperforming previous LLM-based expansion baselines (query2doc, LameR, HyDE) and, when paired with strong sparse retrievers, competitive or superior to state-of-the-art dense/multi-stage approaches (Liu et al., 5 Jun 2025).
6. Ablation Studies and Fusion Necessity
Analyses dissected the effect of multiple retrieval routes and the necessity of fusion:
- Route count: Adding more than two expansion routes (e.g., original, high-detail, medium, subtopic) yields diminishing returns; performance peaks at two (OQ+HDQ) before degrading due to low-precision noise from further expansion.
- Fusion effect: Using only the LLM-expanded query (HDQ) can degrade learned sparse retriever performance (e.g., SPLADE++ nDCG@10: 73.1→67.8), while the two-route fusion not only recovers but boosts performance substantially (73.1→77.6). Adding a third route yields only marginal gain (+0.1–0.7) (Liu et al., 5 Jun 2025).
7. Practical Considerations and Design Recommendations
Exp4Fuse is maximally effective in conjunction with high-performance learned sparse retrievers that index real documents. By maintaining both exact- and semantic-match signals through parallel original/expanded routes and fusion—rather than replacing the original query—Exp4Fuse prevents degradation of lexical matching.
Zero-shot LLM prompting minimizes engineering and computational overhead. The fusion design, with a single sparse index and lightweight LLM querying, renders Exp4Fuse suitable for large-scale, low-latency applications. The approach is robust to diverse information needs and supports rapid integration with strong domain-adapted sparse models.
8. Concluding Insights and Future Prospects
Exp4Fuse demonstrates that rank fusion of original and LLM-expanded queries, under a well-designed reciprocal rank fusion regime, can efficiently achieve SOTA sparse retrieval with minimal additional complexity. The architecture underscores a strong trade-off: semantic enrichment via LLM generation is absorbed without sacrificing exact-match signal or incurring the cost of dense retrieval pipelines.
A plausible implication is that Exp4Fuse's underlying principle—dual-route fusion with LLM-based zero-shot augmentation—can generalize beyond web-scale IR to domains where sparse retrievers and costly dense or re-ranking solutions are infeasible, as long as high-quality LLM expansions are available (Liu et al., 5 Jun 2025).