Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 171 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 60 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Dynamic RAG: Adaptive Retrieval in LLMs

Updated 14 November 2025
  • Dynamic RAG is a framework where large language models actively retrieve external data during inference using feedback-driven, iterative queries.
  • It enhances accuracy and efficiency by interleaving retrieval with generation based on uncertainty signals and multi-hop reasoning.
  • Dynamic RAG employs techniques like dynamic k-selection, reinforcement learning, and multi-armed bandit strategies to balance retrieval cost with performance.

Dynamic Retrieval-Augmented Generation (Dynamic RAG) refers to a class of architectures and methodologies in which LLMs interact with external knowledge sources in a non-static, query- or context-adaptive manner during inference and/or generation. Unlike traditional RAG pipelines, which retrieve a fixed set of documents per input query (typically via a one-shot similarity search), Dynamic RAG systems adaptively decide what, when, and how much to retrieve, often interleaving retrieval with generation and utilizing explicit feedback to optimize both pipeline components.

1. Foundations and Motivation

The foundational goal of Retrieval-Augmented Generation is to overcome the knowledge limitations and hallucination tendencies of LLMs by grounding their outputs in external, dynamically retrieved information. In standard RAG, a retriever—usually a bi-encoder—fetches a set of relevant contexts for a given query, which are then concatenated (or otherwise incorporated) into the generation context for the LLM. However, this static paradigm faces several well-documented limitations (Gupta et al., 3 Oct 2024):

  • Inflexibility: Static retrieval at t=0t=0 assumes the query fully specifies all information needs, which is often false in multi-hop, ambiguous, or evolving scenarios.
  • Long-context inefficiency: As the number or length of retrieved passages grows, softmax attention becomes diluted (“attention dilution”), leading to degraded focus and increased entropy in the model’s information distribution (Wang et al., 7 Aug 2025).
  • Missed Relevance: Important but indirect or dynamically relevant evidence is often missed because static scoring cannot adapt to intermediate findings (Hei et al., 11 Jun 2024).

Dynamic RAG methodologies address these challenges by introducing mechanisms for iterative, state-aware, or feedback-driven retrieval aimed at (a) maximizing answer accuracy and factual grounding, (b) minimizing unnecessary context, (c) improving robustness to ambiguity and multi-step reasoning, and (d) enhancing efficiency by limiting LLM calls and context expansion (Sun et al., 12 May 2025, Tang et al., 2 Dec 2024, Su et al., 7 Jun 2025).

2. Key Architectural Patterns in Dynamic RAG

2.1 Interleaved Retrieval and Generation

Dynamic RAG decomposes the classic retrieve-then-generate pipeline into more granular, often iterative steps:

  • Trigger Policies: The system computes, at each generation step tt, an uncertainty or information salience signal u(st)u(s_t) (e.g., predictive entropy on next-token probabilities), invoking retrieval when this surpasses a learned or hand-tuned threshold τ\tau (Su et al., 7 Jun 2025).
  • State-Dependent Queries: Instead of statically encoding the original query, the retrieval query at time tt can be a function fquery(st)f_{\text{query}}(s_t) of the generation history or internal hidden state, e.g., qt=MLP([q;ht])q_t = \text{MLP}([q; h_t]) (He et al., 28 Apr 2025).
  • Context Update: Retrieved documents DtD_t are injected into the context via concatenation or as special tokens, and decoding continues until end-of-sequence or another retrieval is triggered.

2.2 Dynamic Relevance Mining

Two-stage retrieval, as typified by DR-RAG, seeks to capture both “static-relevant” and “dynamic-relevant” evidence. The second stage uses query reformulation based on initially retrieved evidence to surface documents whose relevance emerges only given prior findings (Hei et al., 11 Jun 2024).

2.3 Dynamic k-Selection and Ordering

DynamicRAG formalizes the reranker as an RL agent that outputs both a permutation over retrieved documents and a dynamic stopping point, thereby determining the optimal kk per query (Sun et al., 12 May 2025). The agent is trained on a composite reward signal derived from the quality of the final generated answer, encompassing exact match, semantic similarity, fluency, and LLM-based validation.

2.4 Multi-Armed Bandit Retrieval Selection

MBA-RAG operationalizes adaptive method selection through a multi-armed bandit (MAB) algorithm, where each arm represents a retrieval strategy (e.g., zero-step, one-shot, multi-step retrieval), and the agent optimizes a cost-sensitive reward that trades off answer accuracy against retrieval cost (Tang et al., 2 Dec 2024).

2.5 Graph- and Plan-Based Modularization

Plan*RAG and DyG-RAG decompose complex queries into explicit DAGs of sub-questions (Plan*RAG) or temporally ordered event-entity graphs (DyG-RAG), enabling fine-grained, parallelizable, and causally faithful multi-hop retrieval and answer synthesis (Verma et al., 28 Oct 2024, Sun et al., 16 Jul 2025).

3. Formal Modeling and Algorithmic Instantiations

Let qq be the query, y1:t1y_{1:t-1} the generated tokens so far, sts_t the model state, and RR the retrieval oracle:

  • Adaptive Retrieval Policy:

πtrigger(st)={1if u(st)τ 0otherwise\pi_{\text{trigger}}(s_t) = \begin{cases} 1 & \text{if } u(s_t) \geq \tau \ 0 & \text{otherwise} \end{cases}

where u(st)u(s_t) is, for example, entropy of the model’s predictive distribution (Su et al., 7 Jun 2025).

  • Dynamic k-Selection:

The RL-trained policy πθr\pi_{\theta_r} outputs a sequence of document indices (a1,,aT)(a_1,\dots,a_T) terminated by a STOP token, defining both order and kk. Joint reward:

r=αEM+βSS+γTF+λLP+δLLM-Evalr = \alpha\,\mathrm{EM} + \beta\,\mathrm{SS} + \gamma\,\mathrm{TF} + \lambda\,\mathrm{LP} + \delta\,\mathrm{LLM{\hbox -}Eval}

(Sun et al., 12 May 2025).

  • Iterative Multi-Hop Retrieval (CDF-RAG):

At each hop for a refined query q^\hat{q}, retrieve semantic and causal paths, aggregate as K=TsemCgraph\mathcal{K} = T_{\text{sem}} \cup C_{\text{graph}}, condition generation, and validate output with causal and hallucination metrics (Khatibi et al., 17 Apr 2025).

  • Entropy-Invariant Attention Scaling (BEE-RAG):

Incorporate a balancing factor βi\beta_i, adaptively determined via moment matching (Intrinsic Multi-Importance Inference or parameter-efficient fine-tuning), into softmax attention to enforce context-length-invariant entropy:

ai,j=exp(qikjd+βi)l=1nexp(qikld+βi)a_{i,j} = \frac{\exp\left(\frac{\bm{q}_i\cdot\bm{k}_j}{\sqrt{d}+\beta_i}\right)}{\sum_{l=1}^n\exp\left(\frac{\bm{q}_i\cdot\bm{k}_l}{\sqrt{d}+\beta_i}\right)}

(Wang et al., 7 Aug 2025).

4. Training, Optimization, and Feedback Mechanisms

Dynamic RAG controllers are typically trained using one or more of the following paradigms:

  • Supervised Behavioral Cloning: Imitating expert selection, ordering, or refinement moves observed from a static reranker, robust pathway, or workflow (Sun et al., 12 May 2025).
  • Direct Preference Optimization (DPO): Sampling alternative document selection/ordering trajectories, comparing their induced generation rewards, and optimizing the policy to prefer higher-reward paths (Sun et al., 12 May 2025, Leng et al., 7 Oct 2025).
  • Reinforcement Learning: Using PPO or bandit algorithms to adapt retrieval policies or query-refinement modules based on downstream reward signals combining coverage, context, causal depth, hallucination rates, and end-task metrics (Khatibi et al., 17 Apr 2025, Tang et al., 2 Dec 2024).
  • Contrastive Retraining: The R³ framework “closes the loop” between retriever and generator, labeling positive/negative retrievals using on-policy LLM outputs, and applying a contrastive loss on retriever encodings guided by answer quality (Zhou et al., 28 Oct 2025).

5. Empirical Results, Trade-Offs, and Applications

Dynamic RAG systems achieve substantial empirical gains across QA, fact verification, and complex reasoning tasks:

Table: Selected Empirical Performance

System / Dataset EM / F1 Gain vs. Prev. State-of-the-art
CDF-RAG (MedMCQA) +16 pp Accuracy
DR-RAG (HotpotQA) +6 EM, +9 F1
MBA-RAG (avg 6 datasets) +4.4 EM, +4.8 F1, –20% retrieval steps
DynamicRAG (NQ) EM 48.4 vs. GPT-4o 36.1
DecEx-RAG (6 datasets) +6.3 / +5.7 absolute EM/F1 improvement
BEE-RAG (Qwen-7B) +2–3 pp EM/F1 over best zero-shot RAG

6. Limitations and Open Research Directions

Despite their improved flexibility and performance, Dynamic RAG methods face significant technical challenges and open problems:

  • Retrieval Cost and Latency: Iterative, token-level, or multi-hop retrieval-based architectures may introduce inference latency due to repeated or per-step searches (He et al., 28 Apr 2025, Su et al., 7 Jun 2025).
  • Query Formulation Accuracy: RL-driven or LLM-generated queries can fail to retrieve necessary evidence unless carefully tuned or guided by explicit uncertainty heuristics or chain-of-thought tracing (Su et al., 7 Jun 2025).
  • Credit Assignment and Feedback: Causal attribution of downstream errors or improvements to specific retrieval events is difficult, and naive rewards may induce reward hacking or local optima (Leng et al., 7 Oct 2025).
  • Context Length Management: Even with entropy balancing or compressed embedding approaches (e.g., DRAG), context-window and memory limits remain, especially in high-kk or document-rich settings (Shapkin et al., 2023, Wang et al., 7 Aug 2025).
  • End-to-End Differentiability: Many agentic dynamic RAG approaches remain pipeline-based, with limited end-to-end differentiable training of retrieval and generation modules (He et al., 28 Apr 2025, Su et al., 7 Jun 2025).

Future work aims to:

  • Jointly train retrieval and generation with RL on process- or answer-level rewards;
  • Integrate meta-learning for domain or user-adaptive retrieval policies;
  • Develop scalable, privacy-preserving dynamic indices;
  • Further automate chain-of-thought-guided query refinement and retrieval scheduling;
  • Extend dynamic RAG to multi-modal and agentic settings with strong self-reflection and evidence validation (Gupta et al., 3 Oct 2024, Hu et al., 29 May 2025, Su et al., 7 Jun 2025).

7. Specialized Extensions

Dynamic RAG architectures extend beyond text QA to event-centric temporal reasoning (DyG-RAG), multi-turn dialogue (DH-RAG), and multi-modal VQA or reference-guided image generation (ImageRAG, mRAG). In each, adaptivity—whether by dynamic entity/event selection, context pruning, self-reflection, or plan-based decomposition—yields state-of-the-art robustness, accuracy, and efficiency in real-world, knowledge-intensive scenarios (Sun et al., 16 Jul 2025, Zhang et al., 19 Feb 2025, Hu et al., 29 May 2025, Shalev-Arkushin et al., 13 Feb 2025).


Dynamic RAG methods represent the dominant trend in the evolution of RAG, advancing from rigid, batch-mode retrieval to closed-loop, uncertainty-driven, agentic, parallelized, and context-aware knowledge grounding. This shift is marked by demonstrable gains across efficiency and accuracy metrics, explicit modeling of when/what/how much to retrieve, and a growing methodological diversity suited for complex, high-stakes information retrieval in large-scale language generation systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Dynamic Retrieval-Augmented Generation (RAG).