Pseudo-Relevance Feedback: Methods & Advances

Updated 25 November 2025

Pseudo-Relevance Feedback (PRF) is a query refinement technique that uses the top-ranked documents to reformulate the original query, enhancing retrieval outcomes.
Modern PRF approaches combine text-based methods with dense vector strategies, balancing improved retrieval metrics with computational efficiency.
Advanced PRF models integrate neural architectures and learned query re-encodings, achieving measurable gains in MAP, nDCG, and recall while mitigating query drift.

Pseudo-Relevance Feedback (PRF) is a family of query-modification and signal-aggregation strategies that leverage the top-ranked results of an initial retrieval to refine queries and improve information retrieval effectiveness. PRF assumes that the highest-scoring documents returned by a retrieval model are likely to be relevant, using their contents—either directly (as expansion terms or embedding vectors) or through learned transformations—to reformulate the user's query prior to a second round of retrieval or re-ranking. In the context of neural and dense retrieval systems, PRF has evolved from classic bag-of-words term expansion to sophisticated methods involving dense vector interpolation, advanced attention mechanisms, and learned query re-encoding, providing consistent gains in recall, mean average precision, and ranking quality when tuned appropriately (Li et al., 2021).

1. Core Principles and Classical Formulation

In traditional settings, PRF operates as follows: given a query $q$ , the retrieval model produces an initial ranked list of documents, from which the top $k$ (pseudo-relevant set) are assumed to be relevant. Term statistics, vector embeddings, or other features from these feedback documents are then integrated with the original query, forming a new query representation $q'$ . The expanded query $q'$ is submitted for a second round of retrieval, ideally improving both coverage and ranking of relevant documents.

Two canonical mathematical instances are prevalent:

Rocchio-style vector updates:

$\mathbf{q}' = \alpha\,\mathbf{q} + \beta\,\frac{1}{k}\sum_{i=1}^{k} \mathbf{d}_i$

where $\mathbf{q}$ is the original query vector, $\mathbf{d}_i$ is the $i$ th feedback document embedding, and $\alpha$ , $\beta$ control the contribution of query and feedback respectively.

LLM interpolation (RM3):

$p(w|q') = (1-\lambda)\,p(w|q) + \lambda\,\sum_{d\in D_k} p(d|q)\,p(w|d)$

where $p(w|q)$ is the original query LLM and $p(w|d)$ the document LLM.

2. Methodological Advances: Neural and Dense PRF

Recent work has generalized PRF to deep neural and dense retrieval models by embedding feedback documents in various ways, employing both text-based and vector-based integrations (Li et al., 2021):

Text-based PRF for Deep Rerankers

Concatenate-and-Truncate (CT):

The top $k$ feedback passages are appended to $q$ and truncated to fit the transformer input window:

$Q_\text{new} = \text{trunc}_{256}[q\ \oplus\ p_1\ \oplus\ ...\ \oplus\ p_k]$

Concatenate-and-Aggregate (CA):

Each PRF passage is concatenated in turn, resulting in $k$ queries: $Q_i = q \oplus p_i$ .

Sliding Window (SW):

All PRF passages are joined and split into windows; each window forms a separate query.

After retrieval or reranking with these queries, scores are aggregated (e.g., simple average, max, Borda fusion). These methods can improve shallow ranking metrics but are computationally intensive: each CA run requires multiple transformer passes per query.

Vector-based PRF for Dense Retrievers

Vector Average:

The feedback vectors and query are averaged (or weighted as in Rocchio):

$e'_q = \alpha\,e_q + \beta\,\frac{1}{k}\sum_{i=1}^k e_{p_i}$

Empirical guidance:

PRF is most effective when the original query retains at least half the weight ( $\alpha \geq \beta$ ), and using $k=3$ to $5$ feedback passages; larger $k$ often introduces query drift.

Vector-based PRF yields consistent improvements on deep ranking metrics (MAP, nDCG@10, recall@1000) across diverse datasets and is suitable for low-latency, large-scale applications due to computational efficiency (Li et al., 2021).

3. Learned and Advanced PRF Architectures

Several architectures have been proposed to extend PRF’s robustness and adaptivity:

Neural PRF Frameworks:

NPRF wraps neural rankers and reinterprets feedback documents as pseudo-queries, soft-matching these against candidate documents with building-block models such as DRMM and K-NRM, and aggregating via gated sums or small MLPs (Li et al., 2018). This approach propagates feedback document relevance and retains end-to-end differentiability.

Transformer-based Feedback (TPRF, ANCE-PRF):

PRF modules can be placed atop pre-trained dense retrievers, as in TPRF (Li et al., 2024) and ANCE-PRF (Yu et al., 2021, Li et al., 2021). TPRF applies a multi-layer transformer directly to the initial query and top- $k$ passage embeddings, outputting a new, PRF-refined query vector for nearest-neighbor search. ANCE-PRF concatenates the query and top feedback passages (at the text level), feeding the sequence to a BERT-derived encoder; the [CLS] output serves as the improved query vector. Both approaches are highly effective; TPRF achieves near state-of-the-art results at much lower inference and memory cost.

Generative and Feature-Based PRF:

Recent work leverages LLMs for offline feature extraction (entities, summaries, keywords) and prompt-based fusion to support zero-shot PRF in dense retrieval. PromptPRF (Li et al., 19 Mar 2025) demonstrates that small dense retrievers (e.g., Llama 3.2 3B) combined with rich, LLM-extracted PRF features can match or outperform much larger retrievers without feedback. Generalized PRF frameworks (e.g., GPRF (Tu et al., 29 Oct 2025)) use utility-oriented RL pipelines to produce model-agnostic, natural language query rewrites, mitigating both the model and relevance assumptions of traditional PRF.

4. Empirical Findings and Guidelines

Extensive empirical evaluation reveals:

PRF Method	Shallow k (=1–5)	Deep k (≥10)	Query Weighting	Effectiveness	Computational Cost
Text-based	Optimal gains, stable	Query drift, diminishing/impaired	Less sensitive, but truncation can hurt in CT	nDCG@1, RR up on easy sets	High
Vector-based	Largest MAP/nDCG gains	Little or negative return	Best with $\alpha\geq\beta$	Consistent improvements	Low

Text-based PRF is effective for "shallow" top-k and easy datasets with offline/latency budget; too many feedback passages or overly heavy expansion leads to query drift.
Vector-based PRF is more robust, suitable for first-stage low-latency settings, and computationally tractable ( $\approx160$ ms per query at $k=3$ on a V100 GPU).
Hybrid approaches layering vector-based PRF with deep rerankers can further lift MAP, but runtime cost increases.
Empirically, relative MAP/nDCG gains of 5–20% over base dense retrievers are typical under optimal shallow PRF configurations (Li et al., 2021).

5. Robustness, Drift, and Selective PRF

While PRF is highly effective on average, its performance is sensitive to feedback signal quality and the risk of topic drift:

Feedback Quality:

Dense and learned PRF methods are robust to moderate or even weak signals, whereas classic bag-of-words PRF suffers catastrophic performance loss under noisy feedback (Li et al., 2022). Learned gating or neural aggregation mechanisms help mitigate the impact of poor first-stage retrieval.

Query Drift Suppression:

Comparative regularization frameworks (e.g., LoL (Zhu et al., 2022)) penalize any increase in loss when adding more feedback documents, enforcing monotonic improvements and suppressing noise-induced drift in expanded queries.

Selective PRF:

Transformer-based binary classifiers can predict, per query, whether PRF will improve retrieval. Soft fusion via the classifier's confidence further interpolates between the original and PRF-enriched results, providing consistent nDCG/MAP gains and avoiding drift on PRF-harmful queries (Datta et al., 2024).

6. Computational and Practical Aspects

Scalability and deployment of PRF strategies depend on resource constraints and application requirements:

Text-based PRF:

CA and SW strategies require $k+1$ or $j$ BERT inferences per query, resulting in substantial online latency proportional to $k$ . CT is more efficient at the cost of context truncation. These strategies are unfeasible in real-time services without heavy compute.

Vector-based PRF:

Dense PRF can operate with a single additional vector operation per query and minimal GPU inference time, making it suitable for large-scale scenarios.

Offline PRF:

The OPRF framework (Wen et al., 2023) shifts PRF to an offline stage. Pseudo-queries generated per document are indexed; at query time, only sparse lookup and lightweight aggregation are required, achieving much lower latency with effectiveness comparable to standard dense PRF.

Prompt-based and Feature PRF:

Offline generation of enrichment features (e.g., PromptPRF) moves the expensive LLM inference out of the online path, enabling low-latency PRF for LLM-based dense retrieval at scale (Li et al., 19 Mar 2025).

7. Outlook: Constraints, Assumptions, and Future Directions

Two critical assumptions underpin PRF effectiveness:

Relevance assumption: Top-k feedback documents are actually relevant; violations introduce noise and drift.
Model assumption: PRF must fit the retriever architecture (e.g., term vectors for sparse, embeddings for dense).

Recent frameworks challenge these constraints, introducing model-agnostic, utility-trained PRF (e.g., GPRF (Tu et al., 29 Oct 2025)), selective decision layers (Datta et al., 2024), and generative feedback fusion (Mackie et al., 2023). Utility-oriented RL for query rewriting, cross-modal and conversational PRF, and fully offline feature pipelines represent promising research directions.

PRF remains a central tool for bridging vocabulary and semantic mismatches in retrieval, adaptable across sparse, dense, and hybrid retrieval stacks; for maximal effectiveness, the feedback mechanism, depth $k$ , weighting parameters, and integration with downstream reranking must be carefully tuned and validated per retrieval context (Li et al., 2021, Li et al., 19 Mar 2025, Li et al., 2024).