Plan-Guided Adaptive Retrieval (REPAIR)

Updated 23 April 2026

Plan-Guided Adaptive Retrieval (REPAIR) is a framework that integrates explicit, model-generated plans to determine when and how to retrieve information.
It leverages iterative planning, adaptive triggering, and dual-path scoring to enhance efficiency across tasks like multi-hop reasoning and program repair.
Empirical results show that REPAIR variants reduce retrieval calls and improve performance metrics, making them vital for knowledge-intensive applications.

Plan-Guided Adaptive Retrieval (REPAIR) is a class of retrieval-augmented generation (RAG) methodologies in which explicit, model-generated plans—derived from user queries, intermediate reasoning, or observed code modifications—dynamically orchestrate information retrieval and selection. REPAIR frameworks use structured planning to determine when, what, and how to retrieve, improving efficiency and quality over naïve or static retrieval baselines. Variants of REPAIR span knowledge-intensive question answering, multi-hop reasoning, program repair, and grounded text generation, tightly coupling retrieval-triggering to internal model confidence, decomposition of complex queries, or iterative repair cycles.

1. Core Paradigms and Problem Formulation

REPAIR methods are unified by the explicit integration of plan generation and retrieval selection, deviating from traditional “retrieve-then-generate” or unplanned iterative RAG. The framework typically involves the following key stages:

Plan Generation: The system generates a structured plan—this may be a pseudo-context (vector or text) (Chen et al., 6 Aug 2025), reasoning DAG (Verma et al., 2024), sub-task decomposition (Zhu et al., 13 Nov 2025), or edit vector (Dai et al., 13 Jan 2026).
Retrieval Triggering: A mechanism determines whether external retrieval is necessary, often based on plan–answer agreement, latent clues, or task success (Chen et al., 6 Aug 2025, Zhu et al., 13 Nov 2025).
Adaptive Retrieval: When required, the plan conditions and guides the retrieval—sometimes alongside the original query—enabling dual-path or stepwise document acquisition (Chen et al., 6 Aug 2025, Verma et al., 2024).
Selection and Fusion: Retrieved candidates are scored and selected via plan- and query-anchored similarity metrics, relevance experts, or iterative reward functions. Top-ranked documents are fused into the evidence context (Chen et al., 6 Aug 2025, Kim et al., 8 Jan 2026).
Iterative or Recursive Refinement: Some variants employ closed feedback loops, repairing or extending the plan in response to partial success or failed steps (Zhu et al., 13 Nov 2025, Dai et al., 13 Jan 2026).

Formally, the triggering decision often minimizes expected retrieval cost $\mathbb{E}[R(q)]$ subject to maintaining or improving end-task accuracy (EM/F1 or domain-specific metrics) (Chen et al., 6 Aug 2025).

2. Key Methodological Instantiations

REPAIR has been implemented with diverse designs depending on the target domain and reasoning demands:

Parametric-Verified Adaptive Information Retrieval (PAIRS): For single-hop QA, LLMs generate both an “answer-as-is” and a “plan-augmented” answer. The divergence between these (measured by token-level differences or answer embedding cosine distance) triggers external retrieval only when needed. Dual-path retrieval along both query and plan embeddings, followed by adaptive information selection (AIS), forms the core (Chen et al., 6 Aug 2025).
Recursive Evaluation and Adaptive Planning (REAP): In multi-hop QA, a Sub-task Planner (SP) and Fact Extractor (FE) coordinate to maintain a global task plan and iteratively decompose, retrieve, and synthesize answers. The plan is updated recursively to manage incomplete clues and dynamically reroute failed dependencies (Zhu et al., 13 Nov 2025).
Plan*RAG: Introduces a DAG-based reasoning plan external to the LLM’s memory. Atomic subqueries mapped to DAG nodes enable parallel and dependency-aware retrieval/generation. A “critic expert” dynamically decides per subquery chunk whether new evidence is needed, minimizing retrieval while maintaining attribution (Verma et al., 2024).
Neighborhood-aware Adaptive Retrieval (NAR-REPAIR): For bridge-document identification in reasoning-intensive retrieval, the reranker generates reasoning plans at each window, with “step-adaptive retrieval” guided by explicit reward signals based on plan subgoal coverage and ranking consistency (Kim et al., 8 Jan 2026).
Edit-Driven Program Repair (LSGen-REPAIR): LPR systems encode “edit vectors” extracted from observed buggy-to-fixed submission pairs, retrieving and iteratively refining solution exemplars that best match the plan induced by learner errors. Each repair attempt triggers feedback-driven retrieval updates (Dai et al., 13 Jan 2026).

3. Mathematical and Algorithmic Details

A typical REPAIR pipeline is characterized by:

Trigger Decision:
- $\delta(q) = \begin{cases}</li> <li>1 & \text{if } D(A_p(q),A_c(q)) > \tau \</li> <li>0 & \text{otherwise}</li> <li>\end{cases}$, with $D(\cdot)$ as output divergence (Chen et al., 6 Aug 2025).
Dual-path Retrieval:
- Retrieve $D_q$ (nearest documents to $q$ ), $D_p$ (nearest to plan $c(q)$ ), merge to $D_{2n}$ , then score: $s(d) = \alpha\langle \operatorname{Emb}(q), \operatorname{Emb}(d) \rangle + (1-\alpha)\langle \operatorname{Emb}(c(q)), \operatorname{Emb}(d) \rangle$ .
Iterative Reasoning Loop:
- For multi-hop settings, maintain $P_t$ , $F_t$
- At each $, with$ 0: plan, select actionable sub-tasks, retrieve evidence, extract facts, update plan and fact set (Zhu et al., 13 Nov 2025).

Pseudocode follows a loop of plan → retrieval → answer → plan update (Zhu et al., 13 Nov 2025, Kim et al., 8 Jan 2026). REPAIR systems often feature plan updaters, re-planners, and critiquing/relevance selection experts embedded into the retrieval and synthesis loop.

4. Empirical Validation and Quantitative Results

Extensive experiments across QA and retrieval tasks demonstrate the superiority of REPAIR variants over standard or naively adaptive RAG:

Method	Dataset	Metric/Result Increase	Reference
PAIRS	QA (six sets)	+1.1 EM, +1.0 F1; –25% retrieval calls	(Chen et al., 6 Aug 2025)
Plan*RAG	HotpotQA	Acc/F1: 35.67/39.68 (vs. 25.49/31.22 for RAG)	(Verma et al., 2024)
REAP	HotpotQA	+4.6 F1, +3.2 CEM over R1-Searcher	(Zhu et al., 13 Nov 2025)
NAR-REPAIR	BRIGHT/HotpotQA	nDCG@10 +5.6 pt., EM/F1 32.2/43.0	(Kim et al., 8 Jan 2026)
LSGen	LPR-Bench	Acc 91.4% (vs. ≤40% best baseline)	(Dai et al., 13 Jan 2026)

Ablation studies attribute performance to components such as plan-guided retrieval, dual-path scoring, or recursive replanning. Step-adaptive retrieval and consistency rewards particularly improve bridge-document coverage in complex IR settings (Kim et al., 8 Jan 2026). REPAIR methods are empirically shown to (a) reduce retrieval rounds, (b) improve final task accuracy, and (c) increase the attribution of outputs to real evidence (Godbole et al., 2024, Zhu et al., 13 Nov 2025).

5. Practical Applications and Domain-Specific Variants

REPAIR is deployed in multiple domains:

Knowledge-Intensive QA employs REPAIR to ensure only necessary queries are issued, favoring parametric LM knowledge when it suffices, and minimizing spurious retrieval (Chen et al., 6 Aug 2025).
Multi-Hop and Complex Reasoning relies on plan-then-retrieve strategies to maintain reasoning soundness across multiple dependencies, using DAGs or recursive planners to facilitate evidence gathering and mitigate context window overflow (Verma et al., 2024, Zhu et al., 13 Nov 2025).
Program Repair utilizes edit-driven plan-guided retrieval, using human submission traces as retrieval cues and iteratively refining edits based on failed test outcomes (Dai et al., 13 Jan 2026).
Grounded Long-Form Generation demonstrates reductions in hallucination by structuring long-form content into outline-driven plans, then retrieving per-segment evidence (Godbole et al., 2024).
Knowledge Graph QA integrates explicit plan-retrieval interleaving to manage incomplete KGs, switching between KG and web retrieval based on learned RL policies (Song et al., 23 Oct 2025).

6. Limitations, Pitfalls, and Future Directions

While REPAIR offers robust performance improvements, limitations include:

Propagation of Plan Errors: Mistakes or omissions in the generated reasoning plan can misdirect adaptive retrieval (cf. propagation in NAR-REPAIR (Kim et al., 8 Jan 2026)).
Reward Instabilities: Consistency-based rewards dependent on sufficient history may be unreliable in early iterations (Kim et al., 8 Jan 2026).
Cost Overheads: Multi-stage plan/retrieve/generate loops may increase inference latency and resource usage, particularly in long-form or multi-hop tasks (Godbole et al., 2024, Verma et al., 2024).
Limited Generalization: Current implementations are often tailored to QA, program repair, or text generation; generalization to highly open-ended or creative tasks is underexplored (Godbole et al., 2024).
Evidence Attribution Proxies: Automated attribution (e.g., AutoAIS) can imperfectly reflect human-groundedness, and some external retrievals rely on non-reproducible ranking (e.g., Google WebSearch) (Godbole et al., 2024).

Suggested directions include joint optimization of planners and retrievers, dynamic replanning, user-in-the-loop correction, plan distillation into smaller models, and extension to broader generative tasks (Kim et al., 8 Jan 2026, Godbole et al., 2024).

7. Relationship to Contemporaneous and Preceding Work

REPAIR generalizes and unifies disparate research on adaptive retrieval, plan-driven reasoning, and dual-path document selection in LLM-augmented systems. It differs from static RAG, chain-of-thought only, or post hoc reranking approaches by embedding planning as a first-class citizen within retrieval control. The framework subsumes early dual-path retrieval (Chen et al., 6 Aug 2025), dynamic planning (Zhu et al., 13 Nov 2025), program repair via exemplars (Dai et al., 13 Jan 2026), and dense feedback-guided IR (Kim et al., 8 Jan 2026), and establishes a methodological foundation for efficient, attribution-focused, and reasoning-consistent large-scale information synthesis.