Query-Based RAG: Adaptive Retrieval Methods

Updated 2 October 2025

Query-Based RAG is an adaptive framework that uses user queries to drive both retrieval and generation, integrating semantic understanding with dynamic query rewriting.
It employs hybrid strategies—combining BM25, dense vector KNN, and sparse encoder techniques—to optimize context precision and improve metrics like NDCG and F1 scores.
The framework finds application in diverse domains such as medical, legal, and multimodal settings, and emphasizes robust error correction and adaptive configuration for real-world use.

Query-Based Retrieval-Augmented Generation (RAG) refers to methodologies in which the retrieval and generation pipeline of LLMs is explicitly driven, reformed, or enhanced at the level of the user-query. Unlike classical RAG—which typically employs a static retrieval approach—query-based RAG frameworks exploit the semantics, structure, and intent of the query to drive strategies such as hybrid and adaptive retrieval, multi-stage rewriting, context- and error-aware processing, and cross-modal evidence fusion. These methods aim to maximize contextual expressivity, efficiency, response accuracy, and robustness while scaling to large or heterogeneous knowledge sources.

1. Hybrid and Semantic Retrieval Strategies

Advanced query-based RAG systems incorporate hybrid pipelines that blend multiple retrieval modalities and scoring strategies. The Blended RAG approach (Sawarkar et al., 22 Mar 2024) exemplifies this direction by combining:

BM25-based keyword search: A sparse, term-level retrieval signal suited for direct phrase or keyword matching.
Dense vector-based KNN retrieval: Employing sentence transformer-based embeddings, supporting latent semantic matching via cosine similarity.
Sparse encoder techniques (ELSER): Providing high-dimensional, interpretable, and expanded representations, allowing for nuanced semantic mapping.

Hybrid queries leverage combinations of match and multi-match (cross fields, best fields) strategies, with the final relevance score given as $R(doc) = \alpha \cdot Score_{\mathrm{hybrid}}(doc) + \beta \cdot Score_{\mathrm{vector}}(doc)$ , where $\alpha$ and $\beta$ balance the importance of semantic and keyword aspects. This layered approach enhances retrieval precision, especially as corpora scale, and empirically yields strong improvements on NQ and TREC-COVID (NDCG@10: 0.67 and 0.87, respectively).

2. Query Rewriting, Diversity, and Adaptation

The query is not always an optimal retrieval anchor; it may be noisy, under-specified, or mismatched with the document index. Query-based RAG frameworks therefore integrate sophisticated rewriting and adaptation:

Lossless and information-diverse rewriting: DMQR-RAG (Li et al., 20 Nov 2024) uses General Query Rewriting (GQR), Keyword Rewriting (KWR), Pseudo-Answer Rewriting (PAR), and Core Content Extraction (CCE). This generates a set $q' = \{q, RS_1(q), ..., RS_n(q)\}$ with each strategy targeting a different information level to maximize diversity in the retrieved evidence while minimizing noise.
Ranking feedback for rewriting (RaFe): RaFe (Mao et al., 23 May 2024) eliminates the need for manual annotation by exploiting IR reranker feedback, with loss functions such as Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO) rewarding rewrites that enhance downstream retrieval accuracy.
Query correction against entry errors (QE-RAG): Robust retrievers are trained on contrastive pairs of original and corrupted queries (Zhang et al., 5 Apr 2025), and retrieval-augmented query correction (RA-QCG) augments LLM-based correction with retrieved evidence, directly addressing query entry errors such as spelling, keyboard proximity, and visual similarity.

Adaptive configuration at the query level is also critical for performance in real-world serving scenarios. RAGServe (Ray et al., 13 Dec 2024) integrates a query profiler LLM, mapping query complexity and required reasoning to configuration knobs (retrieved chunk count, synthesis strategy, summary length), with joint scheduling to optimize for both quality and latency per query.

3. Information Gain and Efficiency in Retrieval

Recent research proposes probabilistic and utility-based selection criteria that directly optimize the relevance and diversity of retrieved passages:

Relevant information gain (Dartboard algorithm): (Pickett et al., 16 Jul 2024) Maximizes $s(G, q, A, \sigma) = \sum_t \mathcal{N}(q, t, \sigma) \max_{g \in G} \mathcal{N}(t, g, \sigma)$ such that the final context window contains as much non-redundant, query-relevant information as possible. This approach robustly outperforms classic Maximal Marginal Relevance (MMR) in both end-to-end QA accuracy and NDCG on RGB.
Query grouping for disk-based vector search (CaGR-RAG): (Jeong et al., 2 May 2025) Batches queries based on cluster-access patterns using the Jaccard index for grouping, with opportunistic prefetching mitigating cache misses and reducing 99th percentile tail latency by up to 51.55%.

Query-based adaptation systems also tackle resource efficiency at a deeper pipeline level. SymRAG (Hakim et al., 15 Jun 2025) computes a query complexity score $\kappa(q)$ and resource state $R(t)$ , using a utility-based routing function $\mathcal{U}(P \mid \kappa_{\mathrm{eff}}, R)$ to choose between symbolic, neural, or hybrid processing paths, maximizing throughput while maintaining near-perfect accuracy.

Domain-specific query-based RAG systems exploit the structure of the application domain:

Medical reasoning (DoctorRAG): (Lu et al., 26 May 2025) Fuses explicit clinical and implicit patient case knowledge via concept tagging and hybrid retrieval, using a concept-constrained cosine similarity $s^{K}(q, d_i) = (v_q^T v_{d_i}) / (\|v_q\|\|v_{d_i}\|)$ only when the concept sets intersect. Iterative refinement (Med-TextGrad) uses multi-agent textual gradients to iteratively optimize the response wrt both the context and the patient-specific query.
Legal retrieval (LegalRAG): (Kabir et al., 19 Apr 2025) Incorporates an LLM-based relevance checker and iterative query refiner in a bilingual (Bangla/English) setting, with hard constraints on cosine-similarity-matched chunk retrieval and repeated query refinement for document-centric QA.
Multimodal and STEM education (Uni-RAG): (Wu et al., 5 Jul 2025) Handles queries in diverse styles (text, sketches, audio) by prototype feature extraction and adaptive prompt retrieval from a domain-specific prompt bank, dynamically composed via Mixture-of-Expert Low-Rank Adaptation.

Unified multi-source integration is realized in ER-RAG (Xia et al., 2 Mar 2025) using an Entity-Relationship API with standardized GET and JOIN operations, supporting seamless evidence fusion across structured databases, knowledge graphs, and web text. The two-stage generation decouples optimal source selection (via Direct Preference Optimization) and schema-guided API chain construction.

5. Error Robustness and Failure Modes

Real-world deployment of query-based RAG necessitates robustness to query quality, modality mismatches, and retrieval noise:

Robustness to query entry errors (QE-RAG): (Zhang et al., 5 Apr 2025) Demonstrates that injection of moderate noise ( $20\%$ – $40\%$ corruption) in queries degrades SOTA RAG systems, and that retrieval-augmented correction and robust retriever training can significantly restore F₁ scores even in adverse conditions.
Adaptive and iterative inference: AT-RAG (Rezaei et al., 16 Oct 2024) introduces topic modeling (BERTopic) for efficient document filtering, combined with iterative chain-of-thought reasoning and dynamic answer quality grading. This process filters and refines both the query and the context at each step, improving performance on multi-hop QA benchmarks and medical case studies.

6. Benchmarking, Evaluation, and Future Directions

Comprehensive benchmarking is critical for query-based RAG:

mmRAG (Xu et al., 16 May 2025): A modular evaluation benchmark for RAG across text, tables, and knowledge graphs, supporting component-level breakdown of failures (e.g., query routing, retrieval, and generation). Multi-level annotation protocols (chunk-level, dataset-level) and hybrid metrics (NDCG, MAP, EM, F1) enable granular diagnostic analysis and transparent comparison.
Evaluation frameworks and metrics: Integrated end-to-end evaluation (as advocated in (Sawarkar et al., 22 Mar 2024, Lu et al., 26 May 2025)) is necessary due to limitations of surface-level metrics like EM and F1, especially for assessing human-aligned improvements and domain-specific correctness.

Open problems include standardizing joint metrics for retrieval–generation, dynamic trade-off strategies for index type and retrieval strategy selection (dense vs. sparse, symbolic vs. neural), and extending robust, adaptive, and efficient query-based techniques to cross-modal, low-resource, and streaming settings. Enhanced prompt and rewrite selection, continual learning from execution records (SEFRQO (Liu et al., 24 Aug 2025)), and plug-and-play compatibility with existing pipelines remain active research avenues.

7. Summary Table: Selected Query-Based RAG Methodologies

Framework/Paper	Key Query-Based Technique	Main Impact/Metric Gains
Blended RAG (Sawarkar et al., 22 Mar 2024)	Hybrid keyword–semantic retrieval	NQ NDCG@10 ↑5.8%; SQuAD EM/F1 ↑50%
RaFe (Mao et al., 23 May 2024)	Annotation-free query rewriting w/ reranker	P@5/P@10 ↑2–3%; generalizable rewrites
DMQR-RAG (Li et al., 20 Nov 2024)	Diverse multi-strategy query rewriting	AmbigNQ H@5 ↑; FreshQA P@5 ↑14.46%
Dartboard (Pickett et al., 16 Jul 2024)	Relevant information gain in retrieval	RGB NDCG ↑0.973; higher end-to-end QA
QE-RAG (Zhang et al., 5 Apr 2025)	Error-robust retriever & query correction	F1 score maintained at high noise
FB-RAG (Chawla et al., 22 May 2025)	Forward/backward lookup scoring	LongBench F1 (Ours-F) ↑; latency ↓
RAGServe (Ray et al., 13 Dec 2024)	Per-query quality/delay config adaptation	Latency ↓1.64–2.54×; throughput ↑4.5×
SymRAG (Hakim et al., 15 Jun 2025)	Neuro-symbolic adaptive query routing	EM: HotpotQA 100%, CPU use < 6%
Uni-RAG (Wu et al., 5 Jul 2025)	Adaptive, multimodal query-style retrieval	SER all types: Retrieval@1,5 ↑
ER-RAG (Xia et al., 2 Mar 2025)	Unified GET/JOIN across sources (ER model)	CRAG: LLM score ↑3.1%, speed ↑5.5×

All listed improvements and methodologies are directly grounded in the cited primary sources.

Query-based RAG represents an intersection of modern IR, LLM capabilities, representation learning, and adaptive optimization. Emerging directions target not only higher benchmark scores but also practical deployment—robustness to noisy queries, latency–quality trade-offs, scalable multi-source integration, efficient multimodal access, and domain-specific reliability. The literature demonstrates that the precise treatment of the query—via rewriting, correction, diversity, adaptivity, and semantic modeling—is indispensable for advancing retrieval-augmented machine intelligence.