Papers
Topics
Authors
Recent
Search
2000 character limit reached

RQ-RAG Framework Overview

Updated 5 March 2026
  • RQ-RAG is a framework that explicitly refines queries through rewriting, decomposition, and disambiguation to optimize document retrieval.
  • It employs a hybrid retrieval strategy combining BM25 and dense vector methods to boost answer correctness and context fidelity.
  • Empirical evaluations demonstrate up to 14% improvements in retrieval metrics and enhanced robustness in regulatory and multi-hop question answering tasks.

Retrieval-Query–Retrieval-Augmented Generation (RQ-RAG) Framework

The RQ-RAG framework is a family of methodologies that explicitly integrate query refinement—rewriting, decomposition, and disambiguation—upstream of the retrieval and generation stages in Retrieval-Augmented Generation (RAG) systems. Designed to address challenges in regulatory compliance, complex question answering, and multi-hop reasoning, RQ-RAG improves retrieval precision and factual grounding by programmatically rewriting initial user queries to optimize the information retrieval process. Variants of RQ-RAG have been evaluated both as standalone QA systems and as critical components in production compliance chatbots and competitive retrieval tasks, consistently demonstrating increased answer correctness, context fidelity, and robustness across retrieval backends (Chan et al., 2024, Hillebrand et al., 22 Jul 2025, Martinez et al., 20 Jun 2025).

1. Core Architecture and Workflow

RQ-RAG modifies the canonical RAG pipeline by interposing a query refinement stage before document retrieval. The high-level architecture consists of:

  1. Input: User submits a query xx.
  2. Query Refinement Module: An LLM (e.g., Llama2-7B, GPT-4o) generates a control token (e.g., <SPECIAL_rewrite>, <SPECIAL_decompose>, <SPECIAL_disambiguate>) and a refined query q′q'.
  3. Retrieval Component: The refined query q′q' is submitted to one or more indexed stores (vector, BM25, hybrid), producing top-kk documents D={d1,…,dk}D = \{d_1, \ldots, d_k\}.
  4. Generation Module: The LLM is prompted with [x;SPECIAL token;q′;d1;… ;dk][x; \text{SPECIAL token}; q'; d_1; \dots; d_k] to output either another refinement instruction or the final answer yy.
  5. Inference Process: RQ-RAG conducts a bounded tree search, alternating between further refinement and candidate answer generation until the <SPECIAL_answer> marker triggers response emission (Chan et al., 2024, Hillebrand et al., 22 Jul 2025).

A typical data-flow is as follows:

Step Operation Output
1 User Query xx
2 Refine q′q'
3 Retrieve D={di}D = \{d_i\}
4 Generate yy or more refinement
5 Postproc Final Answer

2. Query Refinement Strategies

The RQ-RAG paradigm supports three explicit operations:

  • Rewriting: Reformulation of vague, underspecified, or poorly worded queries to more focused forms tailored for the retrieval backend (BM25, dense, or hybrid). For example, PreQRAG creates two rewrites—one optimized for sparse retrieval (BM25-style, web-phrasing), one for dense (concise, term-rich) (Martinez et al., 20 Jun 2025).
  • Decomposition: Decomposition of multi-hop or composite questions into atomic sub-queries, improving recall by isolating each required knowledge step.
  • Disambiguation: Detection and clarification of ambiguous queries, generating targeted queries that resolve referential or lexical ambiguities before retrieval.

Formally, at refinement step ii, the model context is

Ci=[x;q1;D1;…;qi−1;Di−1]C_i = [x; q_1; D_1; \ldots; q_{i-1}; D_{i-1}]

where the model emits either a new refinement token-plus-query or a <SPECIAL_answer> followed by yy (Chan et al., 2024, Martinez et al., 20 Jun 2025).

3. Retrieval and Ranking Mechanisms

RQ-RAG retrieves documents using a configurable hybrid dense–sparse pipeline. Key features include:

  • Embedding-Based Retrieval: Compute E(q′)E(q'), E(d)E(d) and score by cosine similarity:

sim(q′,d)=E(q′)⋅E(d)∥E(q′)∥ ∥E(d)∥\text{sim}(q', d) = \frac{E(q') \cdot E(d)}{\|E(q')\|\,\|E(d)\|}

  • BM25 Keyword Retrieval: Standard BM25 implementation over chunked documents.
  • Hybrid and Re-Ranking: Merge top-kk lists from both retrieval modes, followed by cross-encoder reranking (e.g., bge-reranker-v2). PreQRAG achieves up to 14% MRR gain on single-document queries after rewriting and reranking (Martinez et al., 20 Jun 2025).
  • Relevance Boosting: For regulated domains (e.g., compliance), "internal" documents can receive multiplicative trust factors Ï„\tau. Final scores are computed as

sfinal(d)={τ⋅sboosted(d),d∈Internal sboosted(d),otherwises_{\text{final}}(d) = \begin{cases} \tau \cdot s_{\text{boosted}}(d), & d \in \text{Internal} \ s_{\text{boosted}}(d), & \text{otherwise} \end{cases}

This mechanism ensures favored retrieval from sanctioned corpora (Hillebrand et al., 22 Jul 2025).

4. Generation and Prompting Methods

  • LLM Base: Llama2-7B, GPT-4o, Falcon-3B-Instruct, or similar autoregressive models.
  • Prompting Protocol: Supply the current query (original and refined), retrieved context, and explicit instructions:
    • Chain-of-Thought: "Lay out your full thought process."
    • Citation Format: Force LLM to append evidence tags (e.g., (doc_name/chunk_id)).
    • Hallucination Guard: If evidence is not found, instruct the model to state "information not present" (Hillebrand et al., 22 Jul 2025).
  • No Fine-tuning Required: Empirical results show that prompt-based constraints alone suffice for substantial performance gains, though fine-tuning on citation correctness and answer quality can be added:

L=LCE(ypred,ytrue)+λLcite(cpred,ctrue)\mathcal{L} = \mathcal{L}_{\text{CE}}(y_\text{pred}, y_\text{true}) + \lambda \mathcal{L}_\text{cite}(c_\text{pred}, c_\text{true})

(Hillebrand et al., 22 Jul 2025).

5. Hyperparameters and Tuning

RQ-RAG exposes several performance-critical hyperparameters:

Hyperparameter Values / Typical Range Effect
Max Chunk Size {256, 512, 1024, 2048} Trade-off: recall vs. context granularity
Min Overlap {32, 64, 128, 256} Context coherence
kk in top-kk retrieval {5, 10, 20} Recall vs. noise
Search Type {Vector, Text, Hybrid} Balanced lexical–semantic coverage
Relevance Boosting Flag {On, Off} Preference for internal/sanctioned docs
Embedding Model {ada-002, text-embedding-3-large} Embedding fidelity, coverage
Trust Multiplier Ï„\tau [1.0, 3.0] Strength of relevance boosting
Boosted Score Weights α/β\alpha/\beta e.g., 0.5/0.5, 0.7/0.3 Hybridization of vector/BM25 relevance

Optimal performance is typically observed with chunk size 512, overlap 64, k=10k=10, hybrid search enabled, and Ï„=2\tau=2 for trust-based boosting (Hillebrand et al., 22 Jul 2025).

6. Empirical Results and Evaluation

Comprehensive quantitative and qualitative benchmarking validates RQ-RAG's efficacy:

  • Regulatory Compliance QA (Hillebrand et al., 22 Jul 2025):
    • RQ-RAG (hybrid, boosted) outperforms baseline RAG (ada-002, BM25 only):
    • Answer Correctness: 3.79 vs. 3.61 (+5% absolute)
    • Context Correctness: 2.90 vs. 2.75 (+5.5%)
    • Statistical significance: p<0.01p < 0.01 (paired t-test, n=124n=124)
  • Open-Domain and Multi-hop QA (Chan et al., 2024):
    • RQ-RAG surpasses Self-RAG on single-hop tasks by +1.9% (68.3% vs. 66.4% accuracy).
    • Excels in multi-hop (HotpotQA, 2Wiki, MuSiQue): +22.6% average F1 over SFT baselines.
    • Performance is robust across retrieval providers (variance 0.7% vs. 1.8%).
  • Retrieval and Reranking Efficiency (Martinez et al., 20 Jun 2025):
    • Rewriting boosts BM25 MRR by 13.34%; dense MRR by 14.2%.
    • Reranking more than doubles top-1 precision for single-document queries.

7. Best Practices, Lessons, and Deployment Considerations

  • Chunking & Overlap: A 512-token chunk/64-token overlap provides optimal balance for context recall and retrieval tractability.
  • Hybrid Retrieval as Default: Combining sparse (BM25) and dense (embedding) search maximizes both lexical and semantic coverage.
  • Upstream Query Engineering Matters: Explicit classification, rewriting (for single-doc), and decomposition (for multi-doc) yield large retrieval and generation gains with minimal computational cost (Martinez et al., 20 Jun 2025).
  • Trust-Weighted Relevance: Relevance boosting for internal/sanctioned documents is essential in regulated domains to avoid compliance risk.
  • Prompt-Enforced Citations: Strict in-context citation formats in generation prompts markedly reduce LLM hallucination (Hillebrand et al., 22 Jul 2025).
  • Monitoring and Drift Detection: Regularly track source domain distribution and employ A/B testing on trust multipliers to align retrieval with policy objectives.
  • Re-indexing: Refresh indices (full or embedding hot reload) at high frequency (e.g., every 24h) for timely adaptation to changing regulatory content.
  • Ablation and Alerting: Automatically trigger manual review upon significant drops in correctness metrics (>5% G-Eval drop over two weeks).

RQ-RAG frameworks have established a new standard for query-sensitive, retrieval-augmented LLM systems, providing rigorously validated gains in both answer and context correctness across regulatory, open-domain, and multi-hop environments (Chan et al., 2024, Hillebrand et al., 22 Jul 2025, Martinez et al., 20 Jun 2025). The combination of explicit refinement operations, hybrid trusted retrieval, and strong prompting constraints enables robust, compliant, and efficient deployment in settings with complex information governance requirements.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RQ-RAG Framework.