Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 83 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 440 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Query-Based RAG: Adaptive Retrieval Methods

Updated 2 October 2025
  • Query-Based RAG is an adaptive framework that uses user queries to drive both retrieval and generation, integrating semantic understanding with dynamic query rewriting.
  • It employs hybrid strategies—combining BM25, dense vector KNN, and sparse encoder techniques—to optimize context precision and improve metrics like NDCG and F1 scores.
  • The framework finds application in diverse domains such as medical, legal, and multimodal settings, and emphasizes robust error correction and adaptive configuration for real-world use.

Query-Based Retrieval-Augmented Generation (RAG) refers to methodologies in which the retrieval and generation pipeline of LLMs is explicitly driven, reformed, or enhanced at the level of the user-query. Unlike classical RAG—which typically employs a static retrieval approach—query-based RAG frameworks exploit the semantics, structure, and intent of the query to drive strategies such as hybrid and adaptive retrieval, multi-stage rewriting, context- and error-aware processing, and cross-modal evidence fusion. These methods aim to maximize contextual expressivity, efficiency, response accuracy, and robustness while scaling to large or heterogeneous knowledge sources.

1. Hybrid and Semantic Retrieval Strategies

Advanced query-based RAG systems incorporate hybrid pipelines that blend multiple retrieval modalities and scoring strategies. The Blended RAG approach (Sawarkar et al., 22 Mar 2024) exemplifies this direction by combining:

  • BM25-based keyword search: A sparse, term-level retrieval signal suited for direct phrase or keyword matching.
  • Dense vector-based KNN retrieval: Employing sentence transformer-based embeddings, supporting latent semantic matching via cosine similarity.
  • Sparse encoder techniques (ELSER): Providing high-dimensional, interpretable, and expanded representations, allowing for nuanced semantic mapping.

Hybrid queries leverage combinations of match and multi-match (cross fields, best fields) strategies, with the final relevance score given as R(doc)=αScorehybrid(doc)+βScorevector(doc)R(doc) = \alpha \cdot Score_{\mathrm{hybrid}}(doc) + \beta \cdot Score_{\mathrm{vector}}(doc), where α\alpha and β\beta balance the importance of semantic and keyword aspects. This layered approach enhances retrieval precision, especially as corpora scale, and empirically yields strong improvements on NQ and TREC-COVID (NDCG@10: 0.67 and 0.87, respectively).

2. Query Rewriting, Diversity, and Adaptation

The query is not always an optimal retrieval anchor; it may be noisy, under-specified, or mismatched with the document index. Query-based RAG frameworks therefore integrate sophisticated rewriting and adaptation:

  • Lossless and information-diverse rewriting: DMQR-RAG (Li et al., 20 Nov 2024) uses General Query Rewriting (GQR), Keyword Rewriting (KWR), Pseudo-Answer Rewriting (PAR), and Core Content Extraction (CCE). This generates a set q={q,RS1(q),...,RSn(q)}q' = \{q, RS_1(q), ..., RS_n(q)\} with each strategy targeting a different information level to maximize diversity in the retrieved evidence while minimizing noise.
  • Ranking feedback for rewriting (RaFe): RaFe (Mao et al., 23 May 2024) eliminates the need for manual annotation by exploiting IR reranker feedback, with loss functions such as Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO) rewarding rewrites that enhance downstream retrieval accuracy.
  • Query correction against entry errors (QE-RAG): Robust retrievers are trained on contrastive pairs of original and corrupted queries (Zhang et al., 5 Apr 2025), and retrieval-augmented query correction (RA-QCG) augments LLM-based correction with retrieved evidence, directly addressing query entry errors such as spelling, keyboard proximity, and visual similarity.

Adaptive configuration at the query level is also critical for performance in real-world serving scenarios. RAGServe (Ray et al., 13 Dec 2024) integrates a query profiler LLM, mapping query complexity and required reasoning to configuration knobs (retrieved chunk count, synthesis strategy, summary length), with joint scheduling to optimize for both quality and latency per query.

3. Information Gain and Efficiency in Retrieval

Recent research proposes probabilistic and utility-based selection criteria that directly optimize the relevance and diversity of retrieved passages:

  • Relevant information gain (Dartboard algorithm): (Pickett et al., 16 Jul 2024) Maximizes s(G,q,A,σ)=tN(q,t,σ)maxgGN(t,g,σ)s(G, q, A, \sigma) = \sum_t \mathcal{N}(q, t, \sigma) \max_{g \in G} \mathcal{N}(t, g, \sigma) such that the final context window contains as much non-redundant, query-relevant information as possible. This approach robustly outperforms classic Maximal Marginal Relevance (MMR) in both end-to-end QA accuracy and NDCG on RGB.
  • Query grouping for disk-based vector search (CaGR-RAG): (Jeong et al., 2 May 2025) Batches queries based on cluster-access patterns using the Jaccard index for grouping, with opportunistic prefetching mitigating cache misses and reducing 99th percentile tail latency by up to 51.55%.

Query-based adaptation systems also tackle resource efficiency at a deeper pipeline level. SymRAG (Hakim et al., 15 Jun 2025) computes a query complexity score κ(q)\kappa(q) and resource state R(t)R(t), using a utility-based routing function U(Pκeff,R)\mathcal{U}(P \mid \kappa_{\mathrm{eff}}, R) to choose between symbolic, neural, or hybrid processing paths, maximizing throughput while maintaining near-perfect accuracy.

4. Application-Specific and Cross-Modal Query Frameworks

Domain-specific query-based RAG systems exploit the structure of the application domain:

  • Medical reasoning (DoctorRAG): (Lu et al., 26 May 2025) Fuses explicit clinical and implicit patient case knowledge via concept tagging and hybrid retrieval, using a concept-constrained cosine similarity sK(q,di)=(vqTvdi)/(vqvdi)s^{K}(q, d_i) = (v_q^T v_{d_i}) / (\|v_q\|\|v_{d_i}\|) only when the concept sets intersect. Iterative refinement (Med-TextGrad) uses multi-agent textual gradients to iteratively optimize the response wrt both the context and the patient-specific query.
  • Legal retrieval (LegalRAG): (Kabir et al., 19 Apr 2025) Incorporates an LLM-based relevance checker and iterative query refiner in a bilingual (Bangla/English) setting, with hard constraints on cosine-similarity-matched chunk retrieval and repeated query refinement for document-centric QA.
  • Multimodal and STEM education (Uni-RAG): (Wu et al., 5 Jul 2025) Handles queries in diverse styles (text, sketches, audio) by prototype feature extraction and adaptive prompt retrieval from a domain-specific prompt bank, dynamically composed via Mixture-of-Expert Low-Rank Adaptation.

Unified multi-source integration is realized in ER-RAG (Xia et al., 2 Mar 2025) using an Entity-Relationship API with standardized GET and JOIN operations, supporting seamless evidence fusion across structured databases, knowledge graphs, and web text. The two-stage generation decouples optimal source selection (via Direct Preference Optimization) and schema-guided API chain construction.

5. Error Robustness and Failure Modes

Real-world deployment of query-based RAG necessitates robustness to query quality, modality mismatches, and retrieval noise:

  • Robustness to query entry errors (QE-RAG): (Zhang et al., 5 Apr 2025) Demonstrates that injection of moderate noise (20%20\%40%40\% corruption) in queries degrades SOTA RAG systems, and that retrieval-augmented correction and robust retriever training can significantly restore F₁ scores even in adverse conditions.
  • Adaptive and iterative inference: AT-RAG (Rezaei et al., 16 Oct 2024) introduces topic modeling (BERTopic) for efficient document filtering, combined with iterative chain-of-thought reasoning and dynamic answer quality grading. This process filters and refines both the query and the context at each step, improving performance on multi-hop QA benchmarks and medical case studies.

6. Benchmarking, Evaluation, and Future Directions

Comprehensive benchmarking is critical for query-based RAG:

  • mmRAG (Xu et al., 16 May 2025): A modular evaluation benchmark for RAG across text, tables, and knowledge graphs, supporting component-level breakdown of failures (e.g., query routing, retrieval, and generation). Multi-level annotation protocols (chunk-level, dataset-level) and hybrid metrics (NDCG, MAP, EM, F1) enable granular diagnostic analysis and transparent comparison.
  • Evaluation frameworks and metrics: Integrated end-to-end evaluation (as advocated in (Sawarkar et al., 22 Mar 2024, Lu et al., 26 May 2025)) is necessary due to limitations of surface-level metrics like EM and F1, especially for assessing human-aligned improvements and domain-specific correctness.

Open problems include standardizing joint metrics for retrieval–generation, dynamic trade-off strategies for index type and retrieval strategy selection (dense vs. sparse, symbolic vs. neural), and extending robust, adaptive, and efficient query-based techniques to cross-modal, low-resource, and streaming settings. Enhanced prompt and rewrite selection, continual learning from execution records (SEFRQO (Liu et al., 24 Aug 2025)), and plug-and-play compatibility with existing pipelines remain active research avenues.

7. Summary Table: Selected Query-Based RAG Methodologies

Framework/Paper Key Query-Based Technique Main Impact/Metric Gains
Blended RAG (Sawarkar et al., 22 Mar 2024) Hybrid keyword–semantic retrieval NQ NDCG@10 ↑5.8%; SQuAD EM/F1 ↑50%
RaFe (Mao et al., 23 May 2024) Annotation-free query rewriting w/ reranker P@5/P@10 ↑2–3%; generalizable rewrites
DMQR-RAG (Li et al., 20 Nov 2024) Diverse multi-strategy query rewriting AmbigNQ H@5 ↑; FreshQA P@5 ↑14.46%
Dartboard (Pickett et al., 16 Jul 2024) Relevant information gain in retrieval RGB NDCG ↑0.973; higher end-to-end QA
QE-RAG (Zhang et al., 5 Apr 2025) Error-robust retriever & query correction F1 score maintained at high noise
FB-RAG (Chawla et al., 22 May 2025) Forward/backward lookup scoring LongBench F1 (Ours-F) ↑; latency ↓
RAGServe (Ray et al., 13 Dec 2024) Per-query quality/delay config adaptation Latency ↓1.64–2.54×; throughput ↑4.5×
SymRAG (Hakim et al., 15 Jun 2025) Neuro-symbolic adaptive query routing EM: HotpotQA 100%, CPU use < 6%
Uni-RAG (Wu et al., 5 Jul 2025) Adaptive, multimodal query-style retrieval SER all types: Retrieval@1,5 ↑
ER-RAG (Xia et al., 2 Mar 2025) Unified GET/JOIN across sources (ER model) CRAG: LLM score ↑3.1%, speed ↑5.5×

All listed improvements and methodologies are directly grounded in the cited primary sources.


Query-based RAG represents an intersection of modern IR, LLM capabilities, representation learning, and adaptive optimization. Emerging directions target not only higher benchmark scores but also practical deployment—robustness to noisy queries, latency–quality trade-offs, scalable multi-source integration, efficient multimodal access, and domain-specific reliability. The literature demonstrates that the precise treatment of the query—via rewriting, correction, diversity, adaptivity, and semantic modeling—is indispensable for advancing retrieval-augmented machine intelligence.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Query-Based RAG.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube