Plan*RAG: Structured Planning in RAG Pipelines
- Plan*RAG is a family of frameworks that introduce symbolic planning into RAG pipelines, decomposing complex queries into atomic sub-problems.
- It employs methodologies like DAG-based planning and state-transition models to guide tailored retrieval and generation steps.
- Plan*RAG enhances performance in multi-hop QA, decision-making, medical planning, and code generation by reducing error propagation and improving interpretability.
Plan*RAG is a collective term for a family of frameworks and methodologies that explicitly introduce symbolic or structured planning to Retrieval-Augmented Generation (RAG) pipelines. Plan*RAG architectures decompose complex tasks into sequenced, often atomic, sub-problems before invoking retrieval, aiming to systematically mitigate error propagation and reasoning drift in knowledge-intensive tasks including multi-hop question answering, code generation, decision-making, and domain-specific planning. This approach is realized in a variety of domains (e.g., text QA, embodied AI, software engineering, medical planning), and shares a characteristic “plan-then-retrieve” or “plan-augmented-retrieve” pattern, contrasting with earlier “retrieve-then-generate” RAG. Recent Plan*RAG instantiations demonstrate improved accuracy, interpretability, and robustness across zero-shot and domain-adapted settings.
1. Core Principles and Motivations
Plan*RAG introduces planning as an explicit, structured intermediary between the user’s query and the retrieval-augmented generation steps. Whereas classic RAG pipelines condition generation directly on retrieved knowledge chunks, Plan*RAG approaches first produce a formalized plan—such as a reasoning DAG, sequential subgoals, or domain-specific pseudocode—that then guides or parameterizes retrieval:
- Query Decomposition: The input is decomposed into atomic, typically single-hop, sub-queries or reasoning steps that collectively define the high-level reasoning path required to answer the original query (Verma et al., 28 Oct 2024, Zhang et al., 25 Feb 2025, Zhang et al., 23 Apr 2025, Lyu et al., 21 Jun 2024).
- Planning Outside LM Context: The reasoning plan is generated and maintained external to the LLM’s context window, circumventing context-length limitations associated with in-context chain-of-thought prompting. For example, “Plan*RAG” formalizes the plan as a Directed Acyclic Graph (DAG) whose sub-nodes correspond to atomic queries or facts (Verma et al., 28 Oct 2024).
- Error Localization and Attribution: By explicitly separating planning, retrieval, and aggregation, Plan*RAG architectures enable targeted verification and multi-granularity consistency checks at each reasoning step (Zhang et al., 23 Apr 2025).
- Adaptivity and Efficiency: Modular design allows for plug-and-play integration with a range of retrievers, generators, and evaluators, and supports parallel execution of plan steps (Verma et al., 28 Oct 2024).
This paradigm addresses major RAG limitations: fragmented reasoning chains, context overflow, error compounding across multi-hop tasks, and an inability to attribute final outputs to discrete evidentiary supports.
2. Representative Plan*RAG Architectures
A diversity of Plan*RAG systems have been developed, sharing a “plan–retrieve–generate” backbone with domain-specific extensions:
| System | Planning Mechanism | Retrieval Strategy | Domain/Application |
|---|---|---|---|
| Plan*RAG (Verma et al., 28 Oct 2024) | Test-time DAG generation, atomic subqueries | Per-node retrieval, Relevance/Critic experts | Multi-hop QA |
| PAR RAG (Zhang et al., 23 Apr 2025) | Top-down plan decomposition, JSON trace | Multi-granularity (coarse+fine) per sub-question | Multi-hop QA |
| LevelRAG (Zhang et al., 25 Feb 2025) | Symbolic high-level searcher, iterative logic planning | Hybrid (sparse/dense/web), query rewriting | QA (single/multi-hop) |
| RPG (Lyu et al., 21 Jun 2024) | Iterative plan-token prediction, plan–answer cycles | Fine-grained selection, multi-task prompt tuning | Knowledge-intensive QA |
| PlanRAG (Lee et al., 18 Jun 2024) | Explicit subgoal generation for decision analysis | Plan-driven SQL/Cypher queries | Decision-making QA |
| PERC (Yoo et al., 17 Dec 2024) | Pseudocode plan-based retrieval, plan as query | Semantic retrieval over plan representations | Code generation |
| ThreatLens (Saha et al., 11 May 2025) | Multi-agent LLM planners for threat/policy/test plan | Vector RAG, iterative user–agent loop | Hardware security |
| MedPlan (Hsu et al., 23 Mar 2025) | Strict SOAP-inspired (Assessment→Plan) planning | Plan- and history-level retrieval | Medical plan generation |
| P-RAG (Xu et al., 17 Sep 2024) | Progressive, iterative plan–retrieve cycles | Scene+goal similarity, growing DB | Embodied task planning |
| Plan+RAG-Code (Bassamzadeh et al., 15 Aug 2024) | DSL plan structuring, function- and few-shot retrieval | API/function metadata + example code | NL to DSL/Automation |
A common pattern is sequential or iterative execution: a plan is composed (by the LLM or auxiliary planner), each plan element triggers a tailored retrieval and generation step (often re-ranked or filtered for relevance), results are aggregated, and—when required—verifiers or critics introduce plan revision or error correction.
3. Formal Models and Algorithmic Structures
Most Plan*RAG systems model planning as a symbolic or partially symbolic process:
- DAG-Based Reasoning (Plan*RAG) (Verma et al., 28 Oct 2024): The plan is a DAG with nodes corresponding to atomic queries. For each , the system instantiates the sub-query by injecting the answers of parent nodes, retrieves supporting documents, and generates , the sub-answer. Parallelization is enabled for nodes at the same depth.
- State-Transition Model (LevelRAG) (Zhang et al., 25 Feb 2025): High-level planning states track both the set of resolved subqueries and their interim summaries; actions include decomposition (“decompose”), summarization, verification, and supplementation. Search is terminated when all subqueries’ summaries are judged sufficient for final answer synthesis.
- Plan-then-Act-and-Review (PAR RAG) (Zhang et al., 23 Apr 2025): The plan is a structured sequence ; at each step coarse- and fine-grained retrievals are performed, followed by consistency checks. The Action module executes sub-queries; the Review module iteratively verifies or revises sub-answers via multi-passage citation overlap.
- Iterative Plan–Answer Cycles (RPG) (Lyu et al., 21 Jun 2024): A plan token specifies the next subtopic, guiding paragraph-level retrieval for an answer segment . Plan–answer iteration continues until output completion or early stopping.
- Explicit Plan for Data-Driven QA (PlanRAG) (Lee et al., 18 Jun 2024): LLM emits a stepwise plan ; each step is translated into SQL/Cypher database queries for observation, then LLM integrates observations using business rules; re-planning is invoked as needed.
- Plan-as-Query Retrieval (PERC) (Yoo et al., 17 Dec 2024): Code examples are mapped to pseudocode plans, and retrieval is performed over plan representations; retrieved examples are converted as needed to the target programming language.
4. Domain-Specific Adaptations and Applications
Plan*RAG methodology has been adapted for a range of knowledge-intensive settings:
- Multi-hop and Long-form Question Answering: Direct evidence for improved multi-step reasoning fidelity, error localization, and overall accuracy. For example, Plan*RAG (Verma et al., 28 Oct 2024) improves HotpotQA accuracy from 25.49% (standard RAG) to 35.67% and F1 from 31.22 to 39.68, while PAR RAG (Zhang et al., 23 Apr 2025) achieves relative EM/F1 uplifts of +31.6% and +37.9% over state-of-the-art baselines on HotpotQA and MuSiQue.
- Decision-Making over Structured Data: PlanRAG outperforms prior iterative RAG by +15.8 pp in Locating and +7.4 pp in Building scenarios on the Decision QA benchmark (Lee et al., 18 Jun 2024).
- Medical Plan Generation: MedPlan’s “Plan × RAG” mirrors clinician workflow by first producing an assessment, then using retrieved cross-patient plus self-history SOAP records to generate personalized treatment plans, yielding up to +0.3183 BLEU and +0.5213 METEOR (Medical-Mixtral-7B-v2k) (Hsu et al., 23 Mar 2025).
- Threat Modeling and Hardware Verification: ThreatLens employs multi-agent planners (threat, policy, plan generation) with RAG, reducing manual effort ~75% and achieving 92% precision in threat filtering on NEORV32 SoC (Saha et al., 11 May 2025).
- Code and DSL Generation: PERC’s plan-as-query retrieval outperforms code retrieval baselines in both in- and cross-language settings, e.g., in MultiPL-E, Ruby: 67.27% (RepoCoder) → 69.81%, Lua: 60.81% → 64.10% (Yoo et al., 17 Dec 2024); Plan+RAG for DSL generation matches fine-tuned baselines in-domain and exceeds them by +7 pts similarity on out-of-domain APIs (Bassamzadeh et al., 15 Aug 2024).
- Embodied AI: P-RAG’s iterative, database-augmented planning improves unseen task success rates on ALFRED: GPT-4 (no retrieval) 7.05% → P-RAG after 3 iters 14.11%, and with self-iteration up to 27.4% (Xu et al., 17 Sep 2024).
5. Theoretical and Empirical Impact
Plan*RAG approaches advance RAG systems by:
- Reducing Error Propagation: Top-down planning, coupled with per-step verification, prevents local retrieval/generation failures from corrupting downstream reasoning.
- Improving Attribution: Atomic subqueries, each linked to a discrete retrieved document, provide strong evidence traceability—76% of answers are exact substrings of the retrieved doc in Plan*RAG (Verma et al., 28 Oct 2024).
- Enhancing Modular Integration: Plug-and-play design supports deployment atop arbitrary LLMs, retrievers (BM25, DPR, Contriever), and verification modules, requiring minimal or no model fine-tuning (Verma et al., 28 Oct 2024, Zhang et al., 25 Feb 2025).
- Enabling Scalability and Efficiency: Parallel plan step execution and context-bounded node retrieval mitigate context window overflow and reduce unnecessary retrievals, as with the Critic Expert in Plan*RAG (retrieval calls reduced by 19% with negligible accuracy loss) (Verma et al., 28 Oct 2024).
- Performance Gains: Across domains, Plan*RAG variants match or exceed proprietary models (e.g., LevelRAG surpasses GPT4o and ReSP) (Zhang et al., 25 Feb 2025), show significant performance uplift compared to vanilla one-pass RAG, and exhibit enhanced generalization to low-resource regimes.
6. Limitations, Challenges, and Future Directions
Notable limitations identified in Plan*RAG research include:
- Computational Overhead: Multi-step planning, per-step retrieval, review modules, and verification add latency (e.g., PAR RAG average RTPQ ≈ 26s) and increase inference cost (Zhang et al., 23 Apr 2025).
- Planning Quality Sensitivity: Poor initial plan decomposition or specification can cause retrieval to miss critical evidence or narrow the search space excessively (Lyu et al., 21 Jun 2024, Yoo et al., 17 Dec 2024).
- Database Scalability and Memory: Progressive accumulation of trajectories or intermediate plans may cause database growth and potential retrieval efficiency degradation, as seen in P-RAG (Xu et al., 17 Sep 2024).
- Limits of Current LLM Reasoners: P-RAG and similar systems plateau as LLM reasoning capabilities (particularly for embodied, non-textual tasks) saturate (Xu et al., 17 Sep 2024).
- Automatic Plan Extraction: The quality of LLM-generated plans or pseudocode may be a failure point (e.g., erroneous plan steps, unreliable pseudocode extraction in PERC) (Yoo et al., 17 Dec 2024).
Ongoing work investigates learned retriever/reranker modules, adaptive granularity planning, plan critics or quality validators, efficient memory condensation, and cross-modal plan representations (e.g., integrating vision directly in embodied settings). A plausible implication is that tighter coupling between learned planning agents and retrieval subsystems, or joint end-to-end optimization as in trainable consistency/verifier modules, could further enhance accuracy and robustness.
7. Summary Table: Plan*RAG Systems and Key Features
| System | Planning | Retrieval | Result/Claim | Reference |
|---|---|---|---|---|
| Plan*RAG | Test-time DAG | Per-node, atomic | +2–6 Acc/F1 on multi-hop QA | (Verma et al., 28 Oct 2024) |
| LevelRAG | Symbolic searcher | Hybrid (S/W/D) | Outperforms GPT4o, F1 up to 69.33% | (Zhang et al., 25 Feb 2025) |
| PAR RAG | JSON plan, review | Multi-granular | +31.6% EM over baseline on HotpotQA | (Zhang et al., 23 Apr 2025) |
| MedPlan | SOAP plan | Patient+history | BLEU up to 0.3183, 66% ↑ clinical eval | (Hsu et al., 23 Mar 2025) |
| ThreatLens | Multi-agent plan | Vector, iterative | 75% manual effort ↓, 92% precision | (Saha et al., 11 May 2025) |
| PERC | Pseudocode plan | Plan-as-query | +1–5 pp Pass@1 on underrepresented PLs | (Yoo et al., 17 Dec 2024) |
| PlanRAG | Stepwise plan | SQL/Cypher gen | +15.8pp / +7.4pp accuracy on DQA | (Lee et al., 18 Jun 2024) |
| RPG | Plan token per step | Paragraph select | +8.5 F1 (2Wiki), +9.1 ROUGE (ASQA) | (Lyu et al., 21 Jun 2024) |
| P-RAG | Progressive planning | History+goal/scene | +7% SR (ALFRED: 7.05%→14.11%→27.4%) | (Xu et al., 17 Sep 2024) |
| Plan+RAG-Code | DSL plan, function | Example+API meta | +7 pts sim on OOD API DSL generation | (Bassamzadeh et al., 15 Aug 2024) |
References
- Plan*RAG: Efficient Test-Time Planning for Retrieval Augmented Generation (Verma et al., 28 Oct 2024)
- LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers (Zhang et al., 25 Feb 2025)
- Credible plan-driven RAG method for Multi-hop Question Answering (Zhang et al., 23 Apr 2025)
- MedPlan: A Two-Stage RAG-Based System for Personalized Medical Plan Generation (Hsu et al., 23 Mar 2025)
- ThreatLens: LLM-guided Threat Modeling and Test Plan Generation for Hardware Security Verification (Saha et al., 11 May 2025)
- PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation (Yoo et al., 17 Dec 2024)
- PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative LLMs as Decision Makers (Lee et al., 18 Jun 2024)
- Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation (Lyu et al., 21 Jun 2024)
- Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task (Xu et al., 17 Sep 2024)
- Plan with Code: Comparing approaches for robust NL to DSL generation (Bassamzadeh et al., 15 Aug 2024)