Dynamic Reasoning and Dual-Evolving Retrieval

Updated 19 April 2026

Dynamic Reasoning is an adaptive process that updates an agent's plan in response to evolving evidence, improving retrieval accuracy.
Dual-Evolving Retrieval refers to the simultaneous refinement of query strategies and evidence pools to overcome static limitations.
Frameworks like D²Plan and MACER empirically demonstrate enhanced efficiency and accuracy by integrating iterative reasoning with dynamic retrieval.

Dynamic Reasoning and Dual-Evolving Retrieval is a paradigm that jointly addresses the challenges of multistep, knowledge-intensive reasoning in LLMs and the limitations of static retrieval pipelines. It integrates adaptive, iterative reasoning with retrieval mechanisms that co-evolve with the agent’s internal state, aiming to emulate the fluid, multi-turn evidence synthesis seen in human problem-solving. This approach underpins recent advances across retrieval-augmented generation (RAG), graph-based retrieval, hybrid memory architectures, and multi-agent reasoning frameworks.

1. Problem Motivation and Failure Modes in Static Retrieval-Augmented Reasoning

Retrieval-augmented LLMs have transformed knowledge-intensive NLP, but traditional static RAG pipelines instantiate the retrieve-then-generate paradigm, fetching a fixed context before generation. Empirical results establish two key failure modes:

Ineffective Search Chain Construction: Naive interleaving of retrieval and reasoning often results in queries that drift off-path, omit key evidence, or fail to recover from errors, which leads to answer failure or hallucination.
Reasoning Hijacking by Peripheral Evidence: As context accumulates, distractors degrade reasoning performance; irrelevant passages can misguide inference or obscure the retrieval of needed facts.

Dual-evolving retrieval and dynamic reasoning frameworks systematically address these points by evolving both the agent’s reasoning plan and the evidence state as new information arrives. In D $^2$ Plan, for example, explicit global planning and evidence purification are tightly integrated, overcoming drift and context pollution (Luo et al., 13 Jan 2026). Similar phenomena are observed in conversational agents, graph-based RAG, and hybrid memory systems (Su et al., 7 Jun 2025, Zhao et al., 15 Feb 2026, Wu et al., 26 Sep 2025).

2. Core Principles and Definitions

At its core, dynamic reasoning refers to the explicit, adaptive update of an agent’s global plan (e.g., sequence of subquestions, subgraph/DAG traversal), in response to retrieval feedback, success/failure signals, or explicit query sufficiency checks. Dual-evolving retrieval denotes the coordinated evolution of (i) the agent’s information needs (dynamic query adaptation, evolving subquestions), and (ii) the candidate context pool (e.g., evolving subgraphs, chunk sets, or multimodal retrieval budgets). This bidirectional feedback loop generates closed architectures wherein reasoning steps direct retrieval and retrieval outcomes reshape the reasoning plan itself.

Canonical instantiations include:

Dual-agent decompositions: Two or more agents explicitly divide the process between planning/reasoning and context evaluation/condensation (e.g., Reasoner and Purifier in D $^2$ Plan (Luo et al., 13 Jan 2026), multi-agent MACER in ToG-3 (Wu et al., 26 Sep 2025), exploration–reflection–finalization loops in ViDoRAG (Wang et al., 25 Feb 2025)).
Explicit plan representations and refinement: Reasoning plans are structured as explicit sequences or graphs of subproblems. Plans are refined or revised dynamically in response to outcome signals (Luo et al., 13 Jan 2026, Zai et al., 14 Oct 2025).
Adaptive context indexes: The retrieval pool evolves via node/edge addition/pruning in graphs (Li et al., 3 Aug 2025, Zai et al., 14 Oct 2025, Wu et al., 26 Sep 2025), hybrid top- $K$ selection in multimodal pipelines (Wang et al., 25 Feb 2025), or hierarchical memory in hybrid architectures (Zhao et al., 15 Feb 2026).
Iterative reflection and self-evaluation: Many frameworks deploy LLM-based modules that assess answer sufficiency, diagnose knowledge gaps, and trigger follow-up subqueries (Zhao et al., 15 Feb 2026, Cheng et al., 25 Apr 2025).

3. Frameworks and Architectural Realizations

Dual-Agent Dynamic Planning (D $^2$ Plan)

D $^2$ Plan formalizes dynamic reasoning with two collaborating LLM agents:

Reasoner ( $\mathcal{R}$ ): Maintains an explicit sequence of sub-questions, dynamically updates this global plan with each evidence update, synthesizes queries, and orchestrates plan refinement or revision upon retrieval failures.
Purifier ( $\mathcal{P}$ ): Receives raw retrieved passages, scores relevance to current sub-questions, condenses relevant content, and signals evidence sufficiency/insufficiency.

At each reasoning step, $\mathcal{R}$ ’s plan $P_t$ and query $Q_t$ evolve in response to the Purifier's relevance assessments. Pseudocode and equation-based formulations rigorously define these update dynamics (Luo et al., 13 Jan 2026).

Multi-Agent Iterative Loops (MACER, ViDoRAG)

Think-on-Graph 3.0 (ToG-3) and ViDoRAG further generalize dual-evolving architectures by introducing multi-agent interactions:

MACER in ToG-3: A loop involving Constructor (subgraph evolution), Retriever, Reflector (sufficiency assessment and sub-query adaptation), and Responser (answering). At each iteration, Reflector refines the query, Constructor updates the subgraph, closing the loop: $^2$ 0; $^2$ 1 (Wu et al., 26 Sep 2025).
ViDoRAG: Three agents (Seeker, Inspector, Answer) perform exploration, reflection/summarization, and finalization over multimodal evidence, using GMM-based hybrid retrieval to adaptively select both textual and visual context (Wang et al., 25 Feb 2025).

Graph/Hypergraph Dynamic Planning (T-GRAG, PRoH)

Dynamic reasoning extends to richly structured knowledge objects:

T-GRAG: Maintains temporally evolving knowledge graphs, decomposes queries into time-conditioned subquestions, and performs three-stage retrieval (temporal subgraph selection, node filtering, fine-grained knowledge extraction), each step feeding back into reasoning (Li et al., 3 Aug 2025).
PRoH: Constructs a context-aware hypergraph neighborhood, dynamically decomposes questions into a Directed Acyclic Graph (DAG) of subproblems, and evolves both plan and search frontiers based on evidence and entity-weighted overlap metrics (Zai et al., 14 Oct 2025).

Conversational Agents and Dual-Process Reasoning (ChatR1, DualRAG, DETOUR)

ChatR1 and DualRAG: Learn dynamic policies via RL: agents flexibly interleave retrieval and reasoning, adapt query trajectories to evolving user intent, and co-adapt retrieval and knowledge aggregation pipelines under joint reward signals (Lupart et al., 15 Oct 2025, Cheng et al., 25 Apr 2025).
DETOUR Benchmark: Operationalizes dual-agent evaluation via a Primary Agent (dynamic, multi-turn retrieval and reasoning) and a fixed Memory Agent (static knowledge), explicitly measuring agents’ ability to accumulate and refine web evidence across turns (Siyan et al., 30 Jan 2026).

4. Training, Objectives, and Algorithmic Foundations

Dynamic reasoning and dual-evolving retrieval require specialized algorithmic and optimization machinery:

Plan-Oriented Reinforcement Learning: Two-stage training protocols are common. On D $^2$ 2Plan, an SFT cold start (using synthesized trajectories) is followed by RL with plan-structured rewards: format, planning, adaptation, and answer rewards guide the Reasoner toward explicit, recoverable reasoning chains (Luo et al., 13 Jan 2026). PPO and REINFORCE estimators are prevalent in optimizing these policies (Lupart et al., 15 Oct 2025).
Sufficiency Reflection Loops: LLM-based reflection modules use completeness checks, self-diagnosis, and query rewrites to drive iterative evidence gathering and plan adaptation (Zhao et al., 15 Feb 2026, Cheng et al., 25 Apr 2025).
Dynamic Query Generation and Retrospective Condensation: Formulations in (Luo et al., 13 Jan 2026, Cheng et al., 25 Apr 2025, Wang et al., 25 Feb 2025) detail per-step query instantiation as functions of the evolving plan and context, while relevance scorers and condensers manage context compaction to preserve only critical evidence.
Explicit Sufficiency/Failure Feedback: Retrievers and reasoners signal retrieval failures, prompting plan revision or context expansion when necessary (Luo et al., 13 Jan 2026, Wu et al., 26 Sep 2025).
Dual-Process Modularization: Frameworks like DualRAG decompose pipelines into Reasoning-Augmented Querying (RaQ) and Progressive Knowledge Aggregation (pKA), operating in a loop of query issuance, document aggregation, and outline update (Cheng et al., 25 Apr 2025).

5. Empirical Outcomes and Benchmark Performance

Dynamic reasoning with dual-evolving retrieval has robust empirical validation:

D $^2$ 3Plan: Achieves +3.8 percentage point average improvement on LasJ-response metric and >4 pp gains on the hardest multi-hop datasets versus RL-based baselines. Ablations reveal plan refinement/revision is indispensable, with removal yielding 15–25% performance loss (Luo et al., 13 Jan 2026).
ToG-3: Demonstrates consistent gains on deep and broad QA benchmarks and wins the majority of head-to-heads against static and single-shot RAG variants. Ablation confirms both evolving query and evolving subgraph mechanisms are critical (Wu et al., 26 Sep 2025).
PRoH and T-GRAG: Outperform GraphRAG and HyperGraphRAG across multi-hop and temporal QA, evidencing particular robustness in high-hop and temporally ambiguous settings (Zai et al., 14 Oct 2025, Li et al., 3 Aug 2025).
ChatR1 and DualRAG: Surpass SFT and static pipeline baselines on F1, BERTScore, and LLM-judged accuracy, with dynamic retrieval cycles and intent-aware reward engineering producing state-of-the-art conversational and multihop QA (Lupart et al., 15 Oct 2025, Cheng et al., 25 Apr 2025).
ViDoRAG: Outperforms text-only and visual-only baselines by 7–27 points, establishing the efficacy of hybrid retrieval and multi-agent reasoning in visually dense domains (Wang et al., 25 Feb 2025).
Hybrid Memory (HyMem): Delivers >10 point gains over naive RAG and >12x reduction in token cost compared to full-context LLMs, empirically confirming the efficiency–effectiveness balance of dual-tier dynamic scheduling (Zhao et al., 15 Feb 2026).

6. Theoretical Models and Dynamical Systems Foundations

Beyond engineering, dynamic reasoning and dual-evolving retrieval are rooted in theoretical models of sequential evidence integration and attractor transitions. In the input-driven Hopfield framework, sequential memory retrieval is formalized as a two-timescale dynamical system, with fast associative convergence and slow saliency-modulated transitions. Explicit conditions for self-sustained reasoning, stepwise plan handoff, and critical behavior (e.g., gain threshold $^2$ 4) are derived, exemplifying rigorous connections between sequential reasoning and dynamical systems (Betteti et al., 3 Mar 2026).

7. Implications, Open Questions, and Research Directions

The dual-evolving paradigm resolves core limitations of static retrieval by enabling feedback-driven, context-adaptive inference at granular levels—from raw passage selection to subgraph construction, from query generation to answer synthesis. Key implications include:

Resilience to Distractors and Noise: Irrelevant context is systematically filtered, and plan drift is avoided through continual plan revision informed by retrieval outcomes.
Efficiency–Effectiveness Frontier: By activating costly modules only as needed and minimizing context sprawl (e.g., via summaries, memory condensation), dynamic approaches maintain practical inference complexity without sacrificing accuracy (Zhao et al., 15 Feb 2026).
Domain and Modality Generality: Dual-evolving methods prove robust across domains (open-domain QA, legal, biomedical, multimodal) and adapt to both text and structured/multimodal sources (Li et al., 3 Aug 2025, Wang et al., 25 Feb 2025, Zai et al., 14 Oct 2025).
Hybridization with Parametric RAG: Dynamic retrieval answers when and what to fetch; parametric approaches resolve how to inject fetched knowledge into the model. Combined strategies further push the accuracy-efficiency boundary (Su et al., 7 Jun 2025).
Analytical Grounding: Theoretical dynamical system models elucidate essential mechanisms for perpetual sequential retrieval, providing a principled alternative to black-box pipeline design (Betteti et al., 3 Mar 2026).

Continued directions include: reinforcement-learning for sufficiency/quality signals, joint training across retrieval and reasoning, deeper integration of graph and multimodal indices, and principled scheduling/policy learning for adaptive evidence retrieval and plan formation.

Key References: