Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking

Published 15 Apr 2026 in cs.IR and cs.CL | (2604.13728v1)

Abstract: We present a hybrid retrieval system for COVID-19 scientific literature, evaluated on the TREC-COVID benchmark (171,332 papers, 50 expert queries). The system implements six retrieval configurations spanning sparse (SPLADE), dense (BGE), rank-level fusion (RRF), and a projection-based vector fusion (B5) approach. RRF fusion achieves the best relevance (nDCG@10 = 0.828), outperforming dense-only by 6.1% and sparse-only by 14.9%. Our projection fusion variant reaches nDCG@10 = 0.678 on expert queries while being 33% faster (847 ms vs. 1271 ms) and producing 2.2x higher ILD@10 than RRF. Evaluation across 400 queries -- including expert, machine-generated, and three paraphrase styles -- shows that B5 delivers the largest relative gain on keyword-heavy reformulations (+8.8%), although RRF remains best in absolute nDCG@10. On expert queries, MMR reranking increases intra-list diversity by 23.8-24.5% at a 20.4-25.4% nDCG@10 cost. Both fusion pipelines evaluated for latency remain below the sub-2 s target across all query sets. The system is deployed as a Streamlit web application backed by Pinecone serverless indices.

Summary

  • The paper demonstrates that rank fusion (RRF) achieves superior nDCG@10 scores, significantly outperforming single-method retrieval approaches.
  • The paper introduces a B5 projection fusion method that reduces latency by 33% and increases result diversity (ILD@10) by over 2× compared to RRF.
  • The paper reveals that MMR reranking enhances diversity at the cost of relevance, highlighting a trade-off ideal for exploratory literature searches.

Hybrid Retrieval Strategies for COVID-19 Literature: Fusion, Projection, and Diversity

System Overview and Methodological Contributions

This paper evaluates hybrid retrieval strategies for COVID-19 scientific literature using the TREC-COVID benchmark. It implements six configurations based on sparse (SPLADE), dense (BGE), rank-level fusion (RRF), and projection-based vector fusion (B5) approaches. The methodologically distinct pipelines are: (1) sparse-only retrieval (B1), (2) dense-only retrieval (B2), (3) RRF fusion (B4), (4) RRF + MMR reranking (B3), (5) B5 projection fusion, and (6) B5 + MMR reranking. Query sets span expert-crafted, machine-generated, and three paraphrase styles to robustly assess retrieval efficacy and diversity. Figure 1

Figure 1: The RRF pipeline fuses ranked lists from SPLADE (sparse) and BGE (dense) encoders for COVID-19 literature queries, followed optionally by MMR reranking for diversity.

The B5 projection fusion applies Achlioptas sparse random projections to map SPLADE's high-dimensional sparse vectors into BGE's 768-dimensional dense space, enabling single-pass retrieval. This process does not depend on training data and leverages Johnson-Lindenstrauss-style guarantees for distance preservation, followed by L2 normalization and weighted vector fusion.

Experimental Evaluation and Numerical Results

Query Diversity and Benchmarking

The evaluation uses 400 queries across expert, generated, conversational, semi-technical, and keyword-heavy sets. Metrics include nDCG@10 (primary relevance), P@10, MRR@10, MAP@10, HitRate@10, and intra-list diversity (ILD@10).

Strong results are observed:

  • RRF (B4) achieves nDCG@10 = 0.8282 on expert queries, outperforming dense-only by 6.1% and sparse-only by 14.9%.
  • B5 projection fusion reaches nDCG@10 = 0.6779, with 33% reduction in latency (847 ms vs. 1271 ms) and 2.2× higher ILD@10 (0.389 vs. 0.176) compared to RRF.
  • On keyword-heavy paraphrases, B5 achieves an 8.8% relative gain, although RRF maintains a higher absolute score.
  • MMR reranking increases diversity (ILD@10) by 23.8–24.5% at a cost of 20.4–25.4% loss in nDCG@10, evidencing the relevance-diversity trade-off.

Latency, Fusion Tuning, and Generalization

B5 delivers consistent speedups in all query sets, and both pipelines remain under 2s latency. Mixing parameter αquery\alpha_{\text{query}} tuning in B5 shows optimal performance at 0.95 in the mixed-document setting, and further increases do not translate to a pure-dense baseline without document-side ablation.

On machine-generated queries, SPLADE sparse retrieval outperforms all fusion methods (nDCG@10 = 0.4272), confirming superior keyword-matching for terse inputs and demonstrating the sensitivity of retrieval methods to query formulation and qrel labeling.

Theoretical and Practical Implications

Hybrid Fusion Efficacy and Trade-offs

Hybrid fusion methods consistently outperform single-method retrieval for expert-formulated and reformulated queries. RRF's rank-level fusion leverages both exact token matches and semantic similarity, reinforcing its efficacy in high-relevance scenarios.

Projection fusion (B5) offers a practical alternative for latency-sensitive or diversity-focused applications. The inherent diversity in projected representations—substantially higher ILD@10—occurs even before explicit diversification via MMR, suggesting projection-based fusion as a mechanism for increasing information breadth.

MMR Reranking: Diversity-Relevance Dilemma

Empirical results establish MMR as a non-trivial relevance-diversity trade-off tool, increasing intra-list dissimilarity at the cost of nDCG@10. This mechanism is best reserved for contexts where diversity supersedes absolute relevance, such as exploratory retrieval or literature surveys.

Robustness to Query Reformulation

Dense, sparse, and fusion pipelines all show varying degrees of robustness across conversational, semi-technical, and keyword-focused paraphrases. Notably, B5 projection fusion is most stable in semi-technical settings, and exhibits the highest relative improvement for keyword-heavy queries. This underscores the value of hybrid and projection-based methods for resilience against query formulation shifts.

Limitations and Future Directions

The primary limitations are: reliance on a single benchmark (TREC-COVID), bottlenecks in sparse encoding throughput, absence of learned projection matrices, and no document-side fusion ablation. The authors identify several prospects for future research: supervised projection training, scaling to the full CORD-19 corpus, alpha tuning at the document level, integration of ColBERT rerankers, and contradiction detection in scientific claims.

Conclusion

This study formally demonstrates that hybrid retrieval, especially RRF fusion, substantially improves relevance for COVID-19 literature queries compared to single-method approaches. Projection-based vector fusion (B5) sacrifices some relevance for notable gains in latency and result diversity, providing a distinct operational profile for retrieval systems. MMR reranking reliably increases diversity, confirming its role as a relevance-diversity negotiation instrument. The findings clarify retrieval method suitability based on query characteristics and operational requirements, and suggest projection fusion as a promising direction for scalable, diversity-enhanced scientific search in future IR systems.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.