Papers
Topics
Authors
Recent
2000 character limit reached

Plan*RAG: Structured Planning in RAG Pipelines

Updated 18 December 2025
  • Plan*RAG is a family of frameworks that introduce symbolic planning into RAG pipelines, decomposing complex queries into atomic sub-problems.
  • It employs methodologies like DAG-based planning and state-transition models to guide tailored retrieval and generation steps.
  • Plan*RAG enhances performance in multi-hop QA, decision-making, medical planning, and code generation by reducing error propagation and improving interpretability.

Plan*RAG is a collective term for a family of frameworks and methodologies that explicitly introduce symbolic or structured planning to Retrieval-Augmented Generation (RAG) pipelines. Plan*RAG architectures decompose complex tasks into sequenced, often atomic, sub-problems before invoking retrieval, aiming to systematically mitigate error propagation and reasoning drift in knowledge-intensive tasks including multi-hop question answering, code generation, decision-making, and domain-specific planning. This approach is realized in a variety of domains (e.g., text QA, embodied AI, software engineering, medical planning), and shares a characteristic “plan-then-retrieve” or “plan-augmented-retrieve” pattern, contrasting with earlier “retrieve-then-generate” RAG. Recent Plan*RAG instantiations demonstrate improved accuracy, interpretability, and robustness across zero-shot and domain-adapted settings.

1. Core Principles and Motivations

Plan*RAG introduces planning as an explicit, structured intermediary between the user’s query and the retrieval-augmented generation steps. Whereas classic RAG pipelines condition generation directly on retrieved knowledge chunks, Plan*RAG approaches first produce a formalized plan—such as a reasoning DAG, sequential subgoals, or domain-specific pseudocode—that then guides or parameterizes retrieval:

  • Query Decomposition: The input is decomposed into atomic, typically single-hop, sub-queries or reasoning steps that collectively define the high-level reasoning path required to answer the original query (Verma et al., 28 Oct 2024, Zhang et al., 25 Feb 2025, Zhang et al., 23 Apr 2025, Lyu et al., 21 Jun 2024).
  • Planning Outside LM Context: The reasoning plan is generated and maintained external to the LLM’s context window, circumventing context-length limitations associated with in-context chain-of-thought prompting. For example, “Plan*RAG” formalizes the plan as a Directed Acyclic Graph (DAG) whose sub-nodes correspond to atomic queries or facts (Verma et al., 28 Oct 2024).
  • Error Localization and Attribution: By explicitly separating planning, retrieval, and aggregation, Plan*RAG architectures enable targeted verification and multi-granularity consistency checks at each reasoning step (Zhang et al., 23 Apr 2025).
  • Adaptivity and Efficiency: Modular design allows for plug-and-play integration with a range of retrievers, generators, and evaluators, and supports parallel execution of plan steps (Verma et al., 28 Oct 2024).

This paradigm addresses major RAG limitations: fragmented reasoning chains, context overflow, error compounding across multi-hop tasks, and an inability to attribute final outputs to discrete evidentiary supports.

2. Representative Plan*RAG Architectures

A diversity of Plan*RAG systems have been developed, sharing a “plan–retrieve–generate” backbone with domain-specific extensions:

System Planning Mechanism Retrieval Strategy Domain/Application
Plan*RAG (Verma et al., 28 Oct 2024) Test-time DAG generation, atomic subqueries Per-node retrieval, Relevance/Critic experts Multi-hop QA
PAR RAG (Zhang et al., 23 Apr 2025) Top-down plan decomposition, JSON trace Multi-granularity (coarse+fine) per sub-question Multi-hop QA
LevelRAG (Zhang et al., 25 Feb 2025) Symbolic high-level searcher, iterative logic planning Hybrid (sparse/dense/web), query rewriting QA (single/multi-hop)
RPG (Lyu et al., 21 Jun 2024) Iterative plan-token prediction, plan–answer cycles Fine-grained selection, multi-task prompt tuning Knowledge-intensive QA
PlanRAG (Lee et al., 18 Jun 2024) Explicit subgoal generation for decision analysis Plan-driven SQL/Cypher queries Decision-making QA
PERC (Yoo et al., 17 Dec 2024) Pseudocode plan-based retrieval, plan as query Semantic retrieval over plan representations Code generation
ThreatLens (Saha et al., 11 May 2025) Multi-agent LLM planners for threat/policy/test plan Vector RAG, iterative user–agent loop Hardware security
MedPlan (Hsu et al., 23 Mar 2025) Strict SOAP-inspired (Assessment→Plan) planning Plan- and history-level retrieval Medical plan generation
P-RAG (Xu et al., 17 Sep 2024) Progressive, iterative plan–retrieve cycles Scene+goal similarity, growing DB Embodied task planning
Plan+RAG-Code (Bassamzadeh et al., 15 Aug 2024) DSL plan structuring, function- and few-shot retrieval API/function metadata + example code NL to DSL/Automation

A common pattern is sequential or iterative execution: a plan is composed (by the LLM or auxiliary planner), each plan element triggers a tailored retrieval and generation step (often re-ranked or filtered for relevance), results are aggregated, and—when required—verifiers or critics introduce plan revision or error correction.

3. Formal Models and Algorithmic Structures

Most Plan*RAG systems model planning as a symbolic or partially symbolic process:

  • DAG-Based Reasoning (Plan*RAG) (Verma et al., 28 Oct 2024): The plan is a DAG G=(V,E)\mathcal{G} = (V, E) with nodes qiq_i corresponding to atomic queries. For each qq, the system instantiates the sub-query q~\tilde{q} by injecting the answers of parent nodes, retrieves supporting documents, and generates G(q)G(q), the sub-answer. Parallelization is enabled for nodes at the same depth.
  • State-Transition Model (LevelRAG) (Zhang et al., 25 Feb 2025): High-level planning states sts_t track both the set of resolved subqueries and their interim summaries; actions include decomposition (“decompose”), summarization, verification, and supplementation. Search is terminated when all subqueries’ summaries are judged sufficient for final answer synthesis.
  • Plan-then-Act-and-Review (PAR RAG) (Zhang et al., 23 Apr 2025): The plan is a structured sequence P={(thoughti,qi)}P = \{(\mathrm{thought}_i, q_i)\}; at each step coarse- and fine-grained retrievals are performed, followed by consistency checks. The Action module executes sub-queries; the Review module iteratively verifies or revises sub-answers via multi-passage citation overlap.
  • Iterative Plan–Answer Cycles (RPG) (Lyu et al., 21 Jun 2024): A plan token ptp_t specifies the next subtopic, guiding paragraph-level retrieval for an answer segment yty_t. Plan–answer iteration continues until output completion or early stopping.
  • Explicit Plan for Data-Driven QA (PlanRAG) (Lee et al., 18 Jun 2024): LLM emits a stepwise plan PP; each step is translated into SQL/Cypher database queries for observation, then LLM integrates observations using business rules; re-planning is invoked as needed.
  • Plan-as-Query Retrieval (PERC) (Yoo et al., 17 Dec 2024): Code examples are mapped to pseudocode plans, and retrieval is performed over plan representations; retrieved examples are converted as needed to the target programming language.

4. Domain-Specific Adaptations and Applications

Plan*RAG methodology has been adapted for a range of knowledge-intensive settings:

  • Multi-hop and Long-form Question Answering: Direct evidence for improved multi-step reasoning fidelity, error localization, and overall accuracy. For example, Plan*RAG (Verma et al., 28 Oct 2024) improves HotpotQA accuracy from 25.49% (standard RAG) to 35.67% and F1 from 31.22 to 39.68, while PAR RAG (Zhang et al., 23 Apr 2025) achieves relative EM/F1 uplifts of +31.6% and +37.9% over state-of-the-art baselines on HotpotQA and MuSiQue.
  • Decision-Making over Structured Data: PlanRAG outperforms prior iterative RAG by +15.8 pp in Locating and +7.4 pp in Building scenarios on the Decision QA benchmark (Lee et al., 18 Jun 2024).
  • Medical Plan Generation: MedPlan’s “Plan × RAG” mirrors clinician workflow by first producing an assessment, then using retrieved cross-patient plus self-history SOAP records to generate personalized treatment plans, yielding up to +0.3183 BLEU and +0.5213 METEOR (Medical-Mixtral-7B-v2k) (Hsu et al., 23 Mar 2025).
  • Threat Modeling and Hardware Verification: ThreatLens employs multi-agent planners (threat, policy, plan generation) with RAG, reducing manual effort ~75% and achieving 92% precision in threat filtering on NEORV32 SoC (Saha et al., 11 May 2025).
  • Code and DSL Generation: PERC’s plan-as-query retrieval outperforms code retrieval baselines in both in- and cross-language settings, e.g., in MultiPL-E, Ruby: 67.27% (RepoCoder) → 69.81%, Lua: 60.81% → 64.10% (Yoo et al., 17 Dec 2024); Plan+RAG for DSL generation matches fine-tuned baselines in-domain and exceeds them by +7 pts similarity on out-of-domain APIs (Bassamzadeh et al., 15 Aug 2024).
  • Embodied AI: P-RAG’s iterative, database-augmented planning improves unseen task success rates on ALFRED: GPT-4 (no retrieval) 7.05% → P-RAG after 3 iters 14.11%, and with self-iteration up to 27.4% (Xu et al., 17 Sep 2024).

5. Theoretical and Empirical Impact

Plan*RAG approaches advance RAG systems by:

  • Reducing Error Propagation: Top-down planning, coupled with per-step verification, prevents local retrieval/generation failures from corrupting downstream reasoning.
  • Improving Attribution: Atomic subqueries, each linked to a discrete retrieved document, provide strong evidence traceability—76% of answers are exact substrings of the retrieved doc in Plan*RAG (Verma et al., 28 Oct 2024).
  • Enhancing Modular Integration: Plug-and-play design supports deployment atop arbitrary LLMs, retrievers (BM25, DPR, Contriever), and verification modules, requiring minimal or no model fine-tuning (Verma et al., 28 Oct 2024, Zhang et al., 25 Feb 2025).
  • Enabling Scalability and Efficiency: Parallel plan step execution and context-bounded node retrieval mitigate context window overflow and reduce unnecessary retrievals, as with the Critic Expert in Plan*RAG (retrieval calls reduced by 19% with negligible accuracy loss) (Verma et al., 28 Oct 2024).
  • Performance Gains: Across domains, Plan*RAG variants match or exceed proprietary models (e.g., LevelRAG surpasses GPT4o and ReSP) (Zhang et al., 25 Feb 2025), show significant performance uplift compared to vanilla one-pass RAG, and exhibit enhanced generalization to low-resource regimes.

6. Limitations, Challenges, and Future Directions

Notable limitations identified in Plan*RAG research include:

  • Computational Overhead: Multi-step planning, per-step retrieval, review modules, and verification add latency (e.g., PAR RAG average RTPQ ≈ 26s) and increase inference cost (Zhang et al., 23 Apr 2025).
  • Planning Quality Sensitivity: Poor initial plan decomposition or specification can cause retrieval to miss critical evidence or narrow the search space excessively (Lyu et al., 21 Jun 2024, Yoo et al., 17 Dec 2024).
  • Database Scalability and Memory: Progressive accumulation of trajectories or intermediate plans may cause database growth and potential retrieval efficiency degradation, as seen in P-RAG (Xu et al., 17 Sep 2024).
  • Limits of Current LLM Reasoners: P-RAG and similar systems plateau as LLM reasoning capabilities (particularly for embodied, non-textual tasks) saturate (Xu et al., 17 Sep 2024).
  • Automatic Plan Extraction: The quality of LLM-generated plans or pseudocode may be a failure point (e.g., erroneous plan steps, unreliable pseudocode extraction in PERC) (Yoo et al., 17 Dec 2024).

Ongoing work investigates learned retriever/reranker modules, adaptive granularity planning, plan critics or quality validators, efficient memory condensation, and cross-modal plan representations (e.g., integrating vision directly in embodied settings). A plausible implication is that tighter coupling between learned planning agents and retrieval subsystems, or joint end-to-end optimization as in trainable consistency/verifier modules, could further enhance accuracy and robustness.

7. Summary Table: Plan*RAG Systems and Key Features

System Planning Retrieval Result/Claim Reference
Plan*RAG Test-time DAG Per-node, atomic +2–6 Acc/F1 on multi-hop QA (Verma et al., 28 Oct 2024)
LevelRAG Symbolic searcher Hybrid (S/W/D) Outperforms GPT4o, F1 up to 69.33% (Zhang et al., 25 Feb 2025)
PAR RAG JSON plan, review Multi-granular +31.6% EM over baseline on HotpotQA (Zhang et al., 23 Apr 2025)
MedPlan SOAP plan Patient+history BLEU up to 0.3183, 66% ↑ clinical eval (Hsu et al., 23 Mar 2025)
ThreatLens Multi-agent plan Vector, iterative 75% manual effort ↓, 92% precision (Saha et al., 11 May 2025)
PERC Pseudocode plan Plan-as-query +1–5 pp Pass@1 on underrepresented PLs (Yoo et al., 17 Dec 2024)
PlanRAG Stepwise plan SQL/Cypher gen +15.8pp / +7.4pp accuracy on DQA (Lee et al., 18 Jun 2024)
RPG Plan token per step Paragraph select +8.5 F1 (2Wiki), +9.1 ROUGE (ASQA) (Lyu et al., 21 Jun 2024)
P-RAG Progressive planning History+goal/scene +7% SR (ALFRED: 7.05%→14.11%→27.4%) (Xu et al., 17 Sep 2024)
Plan+RAG-Code DSL plan, function Example+API meta +7 pts sim on OOD API DSL generation (Bassamzadeh et al., 15 Aug 2024)

References

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Plan*RAG.