Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 130 tok/s
Gemini 3.0 Pro 29 tok/s Pro
Gemini 2.5 Flash 145 tok/s Pro
Kimi K2 191 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Recursive Evaluation & Adaptive Planning (REAP)

Updated 14 November 2025
  • Recursive Evaluation and Adaptive Planning (REAP) is a computational paradigm that decomposes complex tasks into recursive sub-tasks with feedback-driven plan refinement.
  • It enables efficient multi-step reasoning by dynamically re-optimizing plans based on runtime evidence, as seen in multi-hop question answering and recursive query processing.
  • Empirical evaluations report performance gains such as improved F1 scores and 2–2.5× speedups, demonstrating its practical impact on diverse, data-intensive applications.

Recursive Evaluation and Adaptive Planning (REAP) is a class of computational frameworks that operationalize the tightly coupled processes of recursive evaluation (the systematic, stepwise breakdown and solution of sub-problems or sub-tasks) and adaptive planning (dynamic, feedback-driven re-optimization or modification of the task-solving plan). REAP methodologies have emerged independently in question-answering with retrieval-augmented generation (RAG), multi-step reasoning for LLM agents, and recursive query processing in database systems. The frameworks systematically manage the uncertainty and combinatorial complexity inherent to long-horizon or recursive problems by maintaining structured plans, explicit context representations, and data-dependent re-evaluation mechanisms.

1. Core Principles and Formal Framework

Central to REAP is the explicit management of a dynamic plan or task graph and a recursive mechanism for progressively solving or refining the problem in light of new feedback. In multi-hop RAG (as in (Zhu et al., 13 Nov 2025)), the REAP architecture decomposes complex questions into a sequence of sub-tasks using a Sub-task Planner (SP), while a Fact Extractor (FE) module recursively evaluates evidence retrieved for each sub-task, yielding structured facts and fulfillment indicators. In recursive query optimization (Herlihy et al., 2023, Liu et al., 2014), REAP entails representing recursive rule dependencies and their associated plans (e.g., join orders) as explicit schemas, which are re-optimized or modified throughout execution in response to runtime statistics.

A generic REAP procedure involves:

  • Maintaining a global plan P\mathcal{P} (structured sub-tasks or join operations),
  • Maintaining a growing evidence set F\mathcal{F} (facts, observations, or cardinalities),
  • Recursively evaluating and executing the head sub-task (or join),
  • Adapting the remainder of the plan using feedback from the latest evaluation,
  • Iterating until all sub-tasks are resolved or halting criteria are met.

2. Methodological Instantiations

2.1 Multi-Hop RAG for Question Answering

In REAP for RAG (Zhu et al., 13 Nov 2025), at each step tt:

  • The Sub-task Planner (SP) maintains global plan Pt={p1,,pN}\mathcal{P}_t=\{p_1,\dots,p_N\}, where each pi=(idi,qi,depsi)p_i=(\mathit{id}_i, q_i, \mathit{deps}_i) encapsulates the sub-query and its dependencies.
  • The Fact Extractor (FE) retrieves documents for a sub-query qtq_t and extracts a structured fact ft=(st,et,rt,lt)f_t = (s_t, e_t, r_t, l_t), including
    • Concise factual statement sts_t
    • Evidence snippets ete_t
    • Chain-of-thought explanation rtr_t
    • Fulfillment level lt{DirectAnswer,PartialClue,Failed}l_t \in \{\mathrm{DirectAnswer}, \mathrm{PartialClue}, \mathrm{Failed}\}.
  • SP uses ftf_t to update the plan: if lt=DirectAnswerl_t=\mathrm{DirectAnswer}, downstream sub-queries are rewritten or branched; if lt{PartialClue,Failed}l_t \in \{\mathrm{PartialClue}, \mathrm{Failed}\}, a Re-Planner module determines repair or rerouting actions.

The process forms a recursive feedback loop. The plan and fact set jointly define the agent's state, and traceability is guaranteed since each sub-task, its dependencies, and supporting evidence remain explicit for the duration.

2.2 Recursive Query Optimization

REAP in database systems (Herlihy et al., 2023, Liu et al., 2014) uses adaptive metaprogramming and declarative, incremental pruning for recursive Datalog queries:

  • At compile time, recursive rules are parsed into recursion schemas (see RecSchema), parameterized by join order and runtime statistics.
  • During each fix-point iteration, runtime cardinalities and selectivities are measured. If deviation thresholds trigger, the planner re-invokes code generation with updated statistics, yielding new optimized join routines with staged code (via metaprogramming).
  • An incremental Datalog engine maintains the plan search space, costs, and memoization tables. Upon receiving cost deltas, it updates only the affected portions of the AND/OR plan graph using aggregate selection, tuple source suppression, reference counting, and recursive bounding.

Table: REAP Architectural Modularization

Domain Recursive Evaluation Adaptive Planning
Multi-hop RAG Fact Extractor (FE) Sub-task Planner (SP)
LLM Agents (ReCAP) Subtask Execution Plan Refinement (ρ), Plan π
Query Optimization Codegen, Factoring Re-optimizer, Plan Re-generation

3. Recursive Feedback and Plan Adaptation Mechanisms

A distinguishing feature of REAP frameworks is the recursive coupling between action/evaluation and plan modification, creating a closed-loop feedback process:

  • In (Zhu et al., 13 Nov 2025), SP and FE alternate, updating the plan or response for each sub-task as new facts are found. Plan branching, forking, or pruning is data-dependent and may reroute the search or repair sub-tasks at runtime.
  • In (Zhang et al., 27 Oct 2025) (ReCAP), plan-ahead decomposition first emits the entire subtask list. Execution commits to the head, and after each resolution, the full parent context is re-injected to allow for plan refinement—a strategy ensuring persistent top-level intent and preventing information drift.
  • In query optimization (Herlihy et al., 2023), recursive metaprogramming enables switching between runtime-optimized code fragments, based on empirical performance counters. Triggers such as absolute or relative cardinality deviation or periodic iteration counts activate re-planning.

Explicit context and plan re-injection mechanisms ensure cross-level continuity and memory efficiency, avoiding redundant prompting or state bloat.

4. Efficiency, Complexity, and Theoretical Guarantees

REAP methods evolve to address both computational tractability and the limits of classical static (one-shot) planning.

  • Memory: In LLM-based agents, prompt size is upper-bounded by O(KLˉ)\mathcal{O}(K\,\bar L) (sliding window of KK recent rounds, each of length Lˉ\bar L), and state only needs to cover the root-to-current path of recursion (depth dd), yielding O(dLˉ)\mathcal{O}(d\,\bar L) storage, with dNd \ll N for typical long-horizon tasks (Zhang et al., 27 Oct 2025).
  • Iterations: REAP in multi-hop QA achieves an average of 2.2\sim2.2 iterations per answer, fewer than R1-Searcher (3.0\sim3.0) or IRCoT/Iter-RetGen (always 5) (Zhu et al., 13 Nov 2025).
  • Incremental Query Optimization (Liu et al., 2014): Delta-propagation and recursive pruning yield re-optimization latencies of $10$–$100$ ms, with observed $10$–60×60 \times speedup over full re-planning and over $75$–90%90\% alternative plan pruning. Aggregate selection, reference counting, and recursive bounding are central to efficiency.
  • Adaptive Datalog Engines (Herlihy et al., 2023): In Carac, adaptive metaprogramming with staged join order re-planning produces empirically observed 2.2×2.2\times2.5×2.5\times speedups in Datalog query execution, with 60%60\% of improvement directly attributed to runtime plan re-planning.

In all instances, plans and evaluation artifacts are preserved for full traceability, meaning every step and supporting evidence or decision point can be externally validated.

5. Modularization and Training Approaches

Robustness and performance in REAP frameworks are enhanced by modular design and unified training regimes:

  • In multi-hop RAG, REAP employs multi-task fine-tuning over Decomposer, Plan Updater, and Re-Planner datasets. The single planning model MϕM_\phi is trained to minimize Lmulti(ϕ)\mathcal{L}_{\mathrm{multi}(\phi)} as the sum of losses over various sub-tasks, sharing inductive biases across planner roles (Zhu et al., 13 Nov 2025). Ablation shows joint training outperforms per-task or pooled approaches by up to 1%1\% F1.
  • In LLM planning (ReCAP: (Zhang et al., 27 Oct 2025)), plan and refiner modules are unified via a recursive context-sharing protocol, and action selection, execution, and refinement use shared token-level context to ensure seamless adaptation and minimal drift.
  • In Datalog systems, REAP (as implemented in Carac) generalizes to mutual and non-linear recursion, supports distributed and parallel partitioning, and memoizes optimized plan code for reuse under similar runtime conditions (Herlihy et al., 2023).

6. Empirical Evaluation and Applications

Extensive empirical studies substantiate the performance of REAP systems:

Setting Metric REAP Result Top Baseline Improvement
HotpotQA (RAG) F1 68.0% 63.4% +4.6 F1
2WikiMultihopQA (RAG) F1 79.6% 69.4% +10.2 F1
MuSiQue (OOD RAG) F1 38.3% 33.8% +4.5 F1
Bamboogle (OOD RAG) F1 65.2% 58.0% +7.2 F1

Ablations reveal that removing global re-planning, fact verification, or clue extraction causes $3$–4%4\% drops in F1 per module (Zhu et al., 13 Nov 2025), confirming that the recursive evaluation and adaptive planning loop is essential.

In Datalog queries, adaptive Carac achieves $2$–2.5×2.5\times speedup over static compilation in industrial analyses (Herlihy et al., 2023). Incremental re-optimization outpaces full re-optimization by $10$–100×100\times (Liu et al., 2014).

Applications span large-scale MHQA, long-horizon RL/LLM agents, and mission-critical recursive analytics in program analysis and data stream management.

7. Traceability, Generalization, and Limitations

REAP frameworks explicitly log every planned step, adaptation, and evidence artifact, enabling full post-hoc auditability. In (Zhu et al., 13 Nov 2025), the tuple (Pt,Ft)(\mathcal{P}_t, \mathcal{F}_t) provides a complete, human-readable reasoning record with all sub-task dependencies and evidence chains.

Empirically, REAP architectures generalize strongly across domains with limited fine-tuning, as demonstrated by substantial out-of-domain gains on benchmarks like MuSiQue and Bamboogle (trained on  5.5~5.5K examples). Adaptive query processors maintain performance as data distributions and workloads change, continually re-optimizing without full restarts.

Known limitations include single-node scope in some implementations, incomplete handling of state-migration costs under large plan divergences, and continuing dependence on cost-model accuracy in query settings (Liu et al., 2014). Distributed and learning-based extensions are active areas for further research.


In conclusion, Recursive Evaluation and Adaptive Planning constitutes a principled, modular paradigm for managing long-horizon, multi-step, or recursive tasks across diverse computation and reasoning settings. It combines structured, feedback-driven planning with traceable, recursion-compatible evaluation to achieve state-of-the-art performance on both in-domain and out-of-domain tasks, while maintaining transparency and efficiency through rigorous modularization and incremental update techniques (Zhu et al., 13 Nov 2025, Zhang et al., 27 Oct 2025, Herlihy et al., 2023, Liu et al., 2014).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Recursive Evaluation and Adaptive Planning (REAP).