Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models (2503.14887v2)

Published 19 Mar 2025 in cs.IR and cs.LG

Abstract: Scaling dense retrievers to larger LLM backbones has been a dominant strategy for improving their retrieval effectiveness. However, this has substantial cost implications: larger backbones require more expensive hardware (e.g. GPUs with more memory) and lead to higher indexing and querying costs (latency, energy consumption). In this paper, we challenge this paradigm by introducing PromptPRF, a feature-based pseudo-relevance feedback (PRF) framework that enables small LLM-based dense retrievers to achieve effectiveness comparable to much larger models. PromptPRF uses LLMs to extract query-independent, structured and unstructured features (e.g., entities, summaries, chain-of-thought keywords, essay) from top-ranked documents. These features are generated offline and integrated into dense query representations via prompting, enabling efficient retrieval without additional training. Unlike prior methods such as GRF, which rely on online, query-specific generation and sparse retrieval, PromptPRF decouples feedback generation from query processing and supports dense retrievers in a fully zero-shot setting. Experiments on TREC DL and BEIR benchmarks demonstrate that PromptPRF consistently improves retrieval effectiveness and offers favourable cost-effectiveness trade-offs. We further present ablation studies to understand the role of positional feedback and analyse the interplay between feature extractor size, PRF depth, and model performance. Our findings demonstrate that with effective PRF design, scaling the retriever is not always necessary, narrowing the gap between small and large models while reducing inference cost.

PDF Abstract

Pseudo-Relevance Feedback in Zero-Shot Dense Retrieval Using LLMs

This paper investigates the impact and efficacy of pseudo-relevance feedback (PRF) within the field of zero-shot dense retrieval facilitated by LLMs. The authors propose an innovative approach named "PromptPRF," which builds upon the PromptReps method to enhance query representations and improve retrieval performance.

Methodology Overview

The core of the approach lies in leveraging LLMs to extract salient features from the top-ranked documents during an initial retrieval phase. These features include elements like keywords, summaries, and more nuanced constructs such as entities and essays. The extracted features are utilized to refine the query representation in a dense retrieval setting, all within a zero-shot paradigm. This approach is operated offline, providing a significant advantage regarding resource optimization as it does not increase query-time latency.

PromptPRF integrates the following critical components into its framework:

Initial Retrieval: Queries undergo dense retrieval without additional training via LLM-based embeddings.
Feature Extraction: LLMs generate passage-level features based on pre-defined prompt templates, enhancing context without introducing excessive noise.
Query Refinement: The refined query incorporates the features from pseudo-relevant documents, thus improving the retrieval accuracy in subsequent stages.
Second-Stage Retrieval: The refined query representation is engaged for improved passage ranking, leveraging the contextualized information from PRF.

Experimental Findings

The experiments comprehensively utilize benchmarks from TREC 2019 and 2020. Key observations include:

Incorporating PRF significantly improves retrieval effectiveness, particularly for smaller dense retrievers, which can match the efficacy of larger models without PRF.
On TREC DL'19 tasks, PromptPRF enhances nDCG@10 from 0.3695 to 0.5013 for Llama3.2 3B dense retrievers, nearly achieving parity with larger models.
Smaller models benefit notably from larger feature extractors, indicating the importance of context-rich feature generation. However, diminishing returns are apparent when scaling extractor size for already large dense retrieval models.

Implications and Future Directions

The practical implications of this approach are substantial in scenarios where computational resources may be constrained. PRF allows for reduced hardware requirements in production, which is beneficial for real-time applications like conversational search. The process of PRF being conducted offline further supports the deployment in latency-sensitive environments.

Theoretically, the paper challenges the common scaling laws that correlate dense retrieval effectiveness primarily with model size. By more intelligently leveraging model retrieval strategies, this research outlines a path forward allowing smaller models to bridge the gap traditionally occupied by larger and more resource-intensive configurations.

Future work posited by the authors involves fine-tuning various aspects of the approach, including the examination of optimal PRF depth and combining multiple PRF features to refine effectiveness further.

Overall, this research advances the landscape of dense retrieval systems by innovatively harnessing LLM capabilities to deliver enhanced query representations through strategic pseudo-relevance feedback utilization.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Hang Li (277 papers)
Xiao Wang (507 papers)
Bevan Koopman (37 papers)
Guido Zuccon (73 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/_reachsumit/status/1902561605361062353