Papers
Topics
Authors
Recent
2000 character limit reached

Plan*RAG: Structured Planning in RAG Pipelines

Updated 18 December 2025
  • Plan*RAG is a family of frameworks that introduce symbolic planning into RAG pipelines, decomposing complex queries into atomic sub-problems.
  • It employs methodologies like DAG-based planning and state-transition models to guide tailored retrieval and generation steps.
  • Plan*RAG enhances performance in multi-hop QA, decision-making, medical planning, and code generation by reducing error propagation and improving interpretability.

Plan*RAG is a collective term for a family of frameworks and methodologies that explicitly introduce symbolic or structured planning to Retrieval-Augmented Generation (RAG) pipelines. Plan*RAG architectures decompose complex tasks into sequenced, often atomic, sub-problems before invoking retrieval, aiming to systematically mitigate error propagation and reasoning drift in knowledge-intensive tasks including multi-hop question answering, code generation, decision-making, and domain-specific planning. This approach is realized in a variety of domains (e.g., text QA, embodied AI, software engineering, medical planning), and shares a characteristic “plan-then-retrieve” or “plan-augmented-retrieve” pattern, contrasting with earlier “retrieve-then-generate” RAG. Recent Plan*RAG instantiations demonstrate improved accuracy, interpretability, and robustness across zero-shot and domain-adapted settings.

1. Core Principles and Motivations

Plan*RAG introduces planning as an explicit, structured intermediary between the user’s query and the retrieval-augmented generation steps. Whereas classic RAG pipelines condition generation directly on retrieved knowledge chunks, Plan*RAG approaches first produce a formalized plan—such as a reasoning DAG, sequential subgoals, or domain-specific pseudocode—that then guides or parameterizes retrieval:

  • Query Decomposition: The input is decomposed into atomic, typically single-hop, sub-queries or reasoning steps that collectively define the high-level reasoning path required to answer the original query (Verma et al., 2024, Zhang et al., 25 Feb 2025, Zhang et al., 23 Apr 2025, Lyu et al., 2024).
  • Planning Outside LM Context: The reasoning plan is generated and maintained external to the LLM’s context window, circumventing context-length limitations associated with in-context chain-of-thought prompting. For example, “Plan*RAG” formalizes the plan as a Directed Acyclic Graph (DAG) whose sub-nodes correspond to atomic queries or facts (Verma et al., 2024).
  • Error Localization and Attribution: By explicitly separating planning, retrieval, and aggregation, Plan*RAG architectures enable targeted verification and multi-granularity consistency checks at each reasoning step (Zhang et al., 23 Apr 2025).
  • Adaptivity and Efficiency: Modular design allows for plug-and-play integration with a range of retrievers, generators, and evaluators, and supports parallel execution of plan steps (Verma et al., 2024).

This paradigm addresses major RAG limitations: fragmented reasoning chains, context overflow, error compounding across multi-hop tasks, and an inability to attribute final outputs to discrete evidentiary supports.

2. Representative Plan*RAG Architectures

A diversity of Plan*RAG systems have been developed, sharing a “plan–retrieve–generate” backbone with domain-specific extensions:

System Planning Mechanism Retrieval Strategy Domain/Application
Plan*RAG (Verma et al., 2024) Test-time DAG generation, atomic subqueries Per-node retrieval, Relevance/Critic experts Multi-hop QA
PAR RAG (Zhang et al., 23 Apr 2025) Top-down plan decomposition, JSON trace Multi-granularity (coarse+fine) per sub-question Multi-hop QA
LevelRAG (Zhang et al., 25 Feb 2025) Symbolic high-level searcher, iterative logic planning Hybrid (sparse/dense/web), query rewriting QA (single/multi-hop)
RPG (Lyu et al., 2024) Iterative plan-token prediction, plan–answer cycles Fine-grained selection, multi-task prompt tuning Knowledge-intensive QA
PlanRAG (Lee et al., 2024) Explicit subgoal generation for decision analysis Plan-driven SQL/Cypher queries Decision-making QA
PERC (Yoo et al., 2024) Pseudocode plan-based retrieval, plan as query Semantic retrieval over plan representations Code generation
ThreatLens (Saha et al., 11 May 2025) Multi-agent LLM planners for threat/policy/test plan Vector RAG, iterative user–agent loop Hardware security
MedPlan (Hsu et al., 23 Mar 2025) Strict SOAP-inspired (Assessment→Plan) planning Plan- and history-level retrieval Medical plan generation
P-RAG (Xu et al., 2024) Progressive, iterative plan–retrieve cycles Scene+goal similarity, growing DB Embodied task planning
Plan+RAG-Code (Bassamzadeh et al., 2024) DSL plan structuring, function- and few-shot retrieval API/function metadata + example code NL to DSL/Automation

A common pattern is sequential or iterative execution: a plan is composed (by the LLM or auxiliary planner), each plan element triggers a tailored retrieval and generation step (often re-ranked or filtered for relevance), results are aggregated, and—when required—verifiers or critics introduce plan revision or error correction.

3. Formal Models and Algorithmic Structures

Most Plan*RAG systems model planning as a symbolic or partially symbolic process:

  • DAG-Based Reasoning (Plan*RAG) (Verma et al., 2024): The plan is a DAG G=(V,E)\mathcal{G} = (V, E) with nodes qiq_i corresponding to atomic queries. For each qq, the system instantiates the sub-query q~\tilde{q} by injecting the answers of parent nodes, retrieves supporting documents, and generates G(q)G(q), the sub-answer. Parallelization is enabled for nodes at the same depth.
  • State-Transition Model (LevelRAG) (Zhang et al., 25 Feb 2025): High-level planning states sts_t track both the set of resolved subqueries and their interim summaries; actions include decomposition (“decompose”), summarization, verification, and supplementation. Search is terminated when all subqueries’ summaries are judged sufficient for final answer synthesis.
  • Plan-then-Act-and-Review (PAR RAG) (Zhang et al., 23 Apr 2025): The plan is a structured sequence P={(thoughti,qi)}P = \{(\mathrm{thought}_i, q_i)\}; at each step coarse- and fine-grained retrievals are performed, followed by consistency checks. The Action module executes sub-queries; the Review module iteratively verifies or revises sub-answers via multi-passage citation overlap.
  • Iterative Plan–Answer Cycles (RPG) (Lyu et al., 2024): A plan token ptp_t specifies the next subtopic, guiding paragraph-level retrieval for an answer segment yty_t. Plan–answer iteration continues until output completion or early stopping.
  • Explicit Plan for Data-Driven QA (PlanRAG) (Lee et al., 2024): LLM emits a stepwise plan PP; each step is translated into SQL/Cypher database queries for observation, then LLM integrates observations using business rules; re-planning is invoked as needed.
  • Plan-as-Query Retrieval (PERC) (Yoo et al., 2024): Code examples are mapped to pseudocode plans, and retrieval is performed over plan representations; retrieved examples are converted as needed to the target programming language.

4. Domain-Specific Adaptations and Applications

Plan*RAG methodology has been adapted for a range of knowledge-intensive settings:

  • Multi-hop and Long-form Question Answering: Direct evidence for improved multi-step reasoning fidelity, error localization, and overall accuracy. For example, Plan*RAG (Verma et al., 2024) improves HotpotQA accuracy from 25.49% (standard RAG) to 35.67% and F1 from 31.22 to 39.68, while PAR RAG (Zhang et al., 23 Apr 2025) achieves relative EM/F1 uplifts of +31.6% and +37.9% over state-of-the-art baselines on HotpotQA and MuSiQue.
  • Decision-Making over Structured Data: PlanRAG outperforms prior iterative RAG by +15.8 pp in Locating and +7.4 pp in Building scenarios on the Decision QA benchmark (Lee et al., 2024).
  • Medical Plan Generation: MedPlan’s “Plan × RAG” mirrors clinician workflow by first producing an assessment, then using retrieved cross-patient plus self-history SOAP records to generate personalized treatment plans, yielding up to +0.3183 BLEU and +0.5213 METEOR (Medical-Mixtral-7B-v2k) (Hsu et al., 23 Mar 2025).
  • Threat Modeling and Hardware Verification: ThreatLens employs multi-agent planners (threat, policy, plan generation) with RAG, reducing manual effort ~75% and achieving 92% precision in threat filtering on NEORV32 SoC (Saha et al., 11 May 2025).
  • Code and DSL Generation: PERC’s plan-as-query retrieval outperforms code retrieval baselines in both in- and cross-language settings, e.g., in MultiPL-E, Ruby: 67.27% (RepoCoder) → 69.81%, Lua: 60.81% → 64.10% (Yoo et al., 2024); Plan+RAG for DSL generation matches fine-tuned baselines in-domain and exceeds them by +7 pts similarity on out-of-domain APIs (Bassamzadeh et al., 2024).
  • Embodied AI: P-RAG’s iterative, database-augmented planning improves unseen task success rates on ALFRED: GPT-4 (no retrieval) 7.05% → P-RAG after 3 iters 14.11%, and with self-iteration up to 27.4% (Xu et al., 2024).

5. Theoretical and Empirical Impact

Plan*RAG approaches advance RAG systems by:

  • Reducing Error Propagation: Top-down planning, coupled with per-step verification, prevents local retrieval/generation failures from corrupting downstream reasoning.
  • Improving Attribution: Atomic subqueries, each linked to a discrete retrieved document, provide strong evidence traceability—76% of answers are exact substrings of the retrieved doc in Plan*RAG (Verma et al., 2024).
  • Enhancing Modular Integration: Plug-and-play design supports deployment atop arbitrary LLMs, retrievers (BM25, DPR, Contriever), and verification modules, requiring minimal or no model fine-tuning (Verma et al., 2024, Zhang et al., 25 Feb 2025).
  • Enabling Scalability and Efficiency: Parallel plan step execution and context-bounded node retrieval mitigate context window overflow and reduce unnecessary retrievals, as with the Critic Expert in Plan*RAG (retrieval calls reduced by 19% with negligible accuracy loss) (Verma et al., 2024).
  • Performance Gains: Across domains, Plan*RAG variants match or exceed proprietary models (e.g., LevelRAG surpasses GPT4o and ReSP) (Zhang et al., 25 Feb 2025), show significant performance uplift compared to vanilla one-pass RAG, and exhibit enhanced generalization to low-resource regimes.

6. Limitations, Challenges, and Future Directions

Notable limitations identified in Plan*RAG research include:

  • Computational Overhead: Multi-step planning, per-step retrieval, review modules, and verification add latency (e.g., PAR RAG average RTPQ ≈ 26s) and increase inference cost (Zhang et al., 23 Apr 2025).
  • Planning Quality Sensitivity: Poor initial plan decomposition or specification can cause retrieval to miss critical evidence or narrow the search space excessively (Lyu et al., 2024, Yoo et al., 2024).
  • Database Scalability and Memory: Progressive accumulation of trajectories or intermediate plans may cause database growth and potential retrieval efficiency degradation, as seen in P-RAG (Xu et al., 2024).
  • Limits of Current LLM Reasoners: P-RAG and similar systems plateau as LLM reasoning capabilities (particularly for embodied, non-textual tasks) saturate (Xu et al., 2024).
  • Automatic Plan Extraction: The quality of LLM-generated plans or pseudocode may be a failure point (e.g., erroneous plan steps, unreliable pseudocode extraction in PERC) (Yoo et al., 2024).

Ongoing work investigates learned retriever/reranker modules, adaptive granularity planning, plan critics or quality validators, efficient memory condensation, and cross-modal plan representations (e.g., integrating vision directly in embodied settings). A plausible implication is that tighter coupling between learned planning agents and retrieval subsystems, or joint end-to-end optimization as in trainable consistency/verifier modules, could further enhance accuracy and robustness.

7. Summary Table: Plan*RAG Systems and Key Features

System Planning Retrieval Result/Claim Reference
Plan*RAG Test-time DAG Per-node, atomic +2–6 Acc/F1 on multi-hop QA (Verma et al., 2024)
LevelRAG Symbolic searcher Hybrid (S/W/D) Outperforms GPT4o, F1 up to 69.33% (Zhang et al., 25 Feb 2025)
PAR RAG JSON plan, review Multi-granular +31.6% EM over baseline on HotpotQA (Zhang et al., 23 Apr 2025)
MedPlan SOAP plan Patient+history BLEU up to 0.3183, 66% ↑ clinical eval (Hsu et al., 23 Mar 2025)
ThreatLens Multi-agent plan Vector, iterative 75% manual effort ↓, 92% precision (Saha et al., 11 May 2025)
PERC Pseudocode plan Plan-as-query +1–5 pp Pass@1 on underrepresented PLs (Yoo et al., 2024)
PlanRAG Stepwise plan SQL/Cypher gen +15.8pp / +7.4pp accuracy on DQA (Lee et al., 2024)
RPG Plan token per step Paragraph select +8.5 F1 (2Wiki), +9.1 ROUGE (ASQA) (Lyu et al., 2024)
P-RAG Progressive planning History+goal/scene +7% SR (ALFRED: 7.05%→14.11%→27.4%) (Xu et al., 2024)
Plan+RAG-Code DSL plan, function Example+API meta +7 pts sim on OOD API DSL generation (Bassamzadeh et al., 2024)

References

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Plan*RAG.