Plan*RAG: Structured Planning in RAG Pipelines

Updated 18 December 2025

Plan*RAG is a family of frameworks that introduce symbolic planning into RAG pipelines, decomposing complex queries into atomic sub-problems.
It employs methodologies like DAG-based planning and state-transition models to guide tailored retrieval and generation steps.
Plan*RAG enhances performance in multi-hop QA, decision-making, medical planning, and code generation by reducing error propagation and improving interpretability.

Plan*RAG is a collective term for a family of frameworks and methodologies that explicitly introduce symbolic or structured planning to Retrieval-Augmented Generation (RAG) pipelines. Plan*RAG architectures decompose complex tasks into sequenced, often atomic, sub-problems before invoking retrieval, aiming to systematically mitigate error propagation and reasoning drift in knowledge-intensive tasks including multi-hop question answering, code generation, decision-making, and domain-specific planning. This approach is realized in a variety of domains (e.g., text QA, embodied AI, software engineering, medical planning), and shares a characteristic “plan-then-retrieve” or “plan-augmented-retrieve” pattern, contrasting with earlier “retrieve-then-generate” RAG. Recent Plan*RAG instantiations demonstrate improved accuracy, interpretability, and robustness across zero-shot and domain-adapted settings.

1. Core Principles and Motivations

Plan*RAG introduces planning as an explicit, structured intermediary between the user’s query and the retrieval-augmented generation steps. Whereas classic RAG pipelines condition generation directly on retrieved knowledge chunks, Plan*RAG approaches first produce a formalized plan—such as a reasoning DAG, sequential subgoals, or domain-specific pseudocode—that then guides or parameterizes retrieval:

Query Decomposition: The input is decomposed into atomic, typically single-hop, sub-queries or reasoning steps that collectively define the high-level reasoning path required to answer the original query (Verma et al., 2024, Zhang et al., 25 Feb 2025, Zhang et al., 23 Apr 2025, Lyu et al., 2024).
Planning Outside LM Context: The reasoning plan is generated and maintained external to the LLM’s context window, circumventing context-length limitations associated with in-context chain-of-thought prompting. For example, “Plan*RAG” formalizes the plan as a Directed Acyclic Graph (DAG) whose sub-nodes correspond to atomic queries or facts (Verma et al., 2024).
Error Localization and Attribution: By explicitly separating planning, retrieval, and aggregation, Plan*RAG architectures enable targeted verification and multi-granularity consistency checks at each reasoning step (Zhang et al., 23 Apr 2025).
Adaptivity and Efficiency: Modular design allows for plug-and-play integration with a range of retrievers, generators, and evaluators, and supports parallel execution of plan steps (Verma et al., 2024).

This paradigm addresses major RAG limitations: fragmented reasoning chains, context overflow, error compounding across multi-hop tasks, and an inability to attribute final outputs to discrete evidentiary supports.

2. Representative Plan*RAG Architectures

A diversity of Plan*RAG systems have been developed, sharing a “plan–retrieve–generate” backbone with domain-specific extensions:

System	Planning Mechanism	Retrieval Strategy	Domain/Application
Plan*RAG (Verma et al., 2024)	Test-time DAG generation, atomic subqueries	Per-node retrieval, Relevance/Critic experts	Multi-hop QA
PAR RAG (Zhang et al., 23 Apr 2025)	Top-down plan decomposition, JSON trace	Multi-granularity (coarse+fine) per sub-question	Multi-hop QA
LevelRAG (Zhang et al., 25 Feb 2025)	Symbolic high-level searcher, iterative logic planning	Hybrid (sparse/dense/web), query rewriting	QA (single/multi-hop)
RPG (Lyu et al., 2024)	Iterative plan-token prediction, plan–answer cycles	Fine-grained selection, multi-task prompt tuning	Knowledge-intensive QA
PlanRAG (Lee et al., 2024)	Explicit subgoal generation for decision analysis	Plan-driven SQL/Cypher queries	Decision-making QA
PERC (Yoo et al., 2024)	Pseudocode plan-based retrieval, plan as query	Semantic retrieval over plan representations	Code generation
ThreatLens (Saha et al., 11 May 2025)	Multi-agent LLM planners for threat/policy/test plan	Vector RAG, iterative user–agent loop	Hardware security
MedPlan (Hsu et al., 23 Mar 2025)	Strict SOAP-inspired (Assessment→Plan) planning	Plan- and history-level retrieval	Medical plan generation
P-RAG (Xu et al., 2024)	Progressive, iterative plan–retrieve cycles	Scene+goal similarity, growing DB	Embodied task planning
Plan+RAG-Code (Bassamzadeh et al., 2024)	DSL plan structuring, function- and few-shot retrieval	API/function metadata + example code	NL to DSL/Automation

A common pattern is sequential or iterative execution: a plan is composed (by the LLM or auxiliary planner), each plan element triggers a tailored retrieval and generation step (often re-ranked or filtered for relevance), results are aggregated, and—when required—verifiers or critics introduce plan revision or error correction.

3. Formal Models and Algorithmic Structures

Most Plan*RAG systems model planning as a symbolic or partially symbolic process:

DAG-Based Reasoning (Plan*RAG) (Verma et al., 2024): The plan is a DAG $\mathcal{G} = (V, E)$ with nodes $q_i$ corresponding to atomic queries. For each $q$ , the system instantiates the sub-query $\tilde{q}$ by injecting the answers of parent nodes, retrieves supporting documents, and generates $G(q)$ , the sub-answer. Parallelization is enabled for nodes at the same depth.
State-Transition Model (LevelRAG) (Zhang et al., 25 Feb 2025): High-level planning states $s_t$ track both the set of resolved subqueries and their interim summaries; actions include decomposition (“decompose”), summarization, verification, and supplementation. Search is terminated when all subqueries’ summaries are judged sufficient for final answer synthesis.
Plan-then-Act-and-Review (PAR RAG) (Zhang et al., 23 Apr 2025): The plan is a structured sequence $P = \{(\mathrm{thought}_i, q_i)\}$ ; at each step coarse- and fine-grained retrievals are performed, followed by consistency checks. The Action module executes sub-queries; the Review module iteratively verifies or revises sub-answers via multi-passage citation overlap.
Iterative Plan–Answer Cycles (RPG) (Lyu et al., 2024): A plan token $p_t$ specifies the next subtopic, guiding paragraph-level retrieval for an answer segment $y_t$ . Plan–answer iteration continues until output completion or early stopping.
Explicit Plan for Data-Driven QA (PlanRAG) (Lee et al., 2024): LLM emits a stepwise plan $P$ ; each step is translated into SQL/Cypher database queries for observation, then LLM integrates observations using business rules; re-planning is invoked as needed.
Plan-as-Query Retrieval (PERC) (Yoo et al., 2024): Code examples are mapped to pseudocode plans, and retrieval is performed over plan representations; retrieved examples are converted as needed to the target programming language.

4. Domain-Specific Adaptations and Applications

Plan*RAG methodology has been adapted for a range of knowledge-intensive settings:

Multi-hop and Long-form Question Answering: Direct evidence for improved multi-step reasoning fidelity, error localization, and overall accuracy. For example, Plan*RAG (Verma et al., 2024) improves HotpotQA accuracy from 25.49% (standard RAG) to 35.67% and F1 from 31.22 to 39.68, while PAR RAG (Zhang et al., 23 Apr 2025) achieves relative EM/F1 uplifts of +31.6% and +37.9% over state-of-the-art baselines on HotpotQA and MuSiQue.
Decision-Making over Structured Data: PlanRAG outperforms prior iterative RAG by +15.8 pp in Locating and +7.4 pp in Building scenarios on the Decision QA benchmark (Lee et al., 2024).
Medical Plan Generation: MedPlan’s “Plan × RAG” mirrors clinician workflow by first producing an assessment, then using retrieved cross-patient plus self-history SOAP records to generate personalized treatment plans, yielding up to +0.3183 BLEU and +0.5213 METEOR (Medical-Mixtral-7B-v2k) (Hsu et al., 23 Mar 2025).
Threat Modeling and Hardware Verification: ThreatLens employs multi-agent planners (threat, policy, plan generation) with RAG, reducing manual effort ~75% and achieving 92% precision in threat filtering on NEORV32 SoC (Saha et al., 11 May 2025).
Code and DSL Generation: PERC’s plan-as-query retrieval outperforms code retrieval baselines in both in- and cross-language settings, e.g., in MultiPL-E, Ruby: 67.27% (RepoCoder) → 69.81%, Lua: 60.81% → 64.10% (Yoo et al., 2024); Plan+RAG for DSL generation matches fine-tuned baselines in-domain and exceeds them by +7 pts similarity on out-of-domain APIs (Bassamzadeh et al., 2024).
Embodied AI: P-RAG’s iterative, database-augmented planning improves unseen task success rates on ALFRED: GPT-4 (no retrieval) 7.05% → P-RAG after 3 iters 14.11%, and with self-iteration up to 27.4% (Xu et al., 2024).

5. Theoretical and Empirical Impact

Plan*RAG approaches advance RAG systems by:

Reducing Error Propagation: Top-down planning, coupled with per-step verification, prevents local retrieval/generation failures from corrupting downstream reasoning.
Improving Attribution: Atomic subqueries, each linked to a discrete retrieved document, provide strong evidence traceability—76% of answers are exact substrings of the retrieved doc in Plan*RAG (Verma et al., 2024).
Enhancing Modular Integration: Plug-and-play design supports deployment atop arbitrary LLMs, retrievers (BM25, DPR, Contriever), and verification modules, requiring minimal or no model fine-tuning (Verma et al., 2024, Zhang et al., 25 Feb 2025).
Enabling Scalability and Efficiency: Parallel plan step execution and context-bounded node retrieval mitigate context window overflow and reduce unnecessary retrievals, as with the Critic Expert in Plan*RAG (retrieval calls reduced by 19% with negligible accuracy loss) (Verma et al., 2024).
Performance Gains: Across domains, Plan*RAG variants match or exceed proprietary models (e.g., LevelRAG surpasses GPT4o and ReSP) (Zhang et al., 25 Feb 2025), show significant performance uplift compared to vanilla one-pass RAG, and exhibit enhanced generalization to low-resource regimes.

6. Limitations, Challenges, and Future Directions

Notable limitations identified in Plan*RAG research include:

Computational Overhead: Multi-step planning, per-step retrieval, review modules, and verification add latency (e.g., PAR RAG average RTPQ ≈ 26s) and increase inference cost (Zhang et al., 23 Apr 2025).
Planning Quality Sensitivity: Poor initial plan decomposition or specification can cause retrieval to miss critical evidence or narrow the search space excessively (Lyu et al., 2024, Yoo et al., 2024).
Database Scalability and Memory: Progressive accumulation of trajectories or intermediate plans may cause database growth and potential retrieval efficiency degradation, as seen in P-RAG (Xu et al., 2024).
Limits of Current LLM Reasoners: P-RAG and similar systems plateau as LLM reasoning capabilities (particularly for embodied, non-textual tasks) saturate (Xu et al., 2024).
Automatic Plan Extraction: The quality of LLM-generated plans or pseudocode may be a failure point (e.g., erroneous plan steps, unreliable pseudocode extraction in PERC) (Yoo et al., 2024).

Ongoing work investigates learned retriever/reranker modules, adaptive granularity planning, plan critics or quality validators, efficient memory condensation, and cross-modal plan representations (e.g., integrating vision directly in embodied settings). A plausible implication is that tighter coupling between learned planning agents and retrieval subsystems, or joint end-to-end optimization as in trainable consistency/verifier modules, could further enhance accuracy and robustness.

7. Summary Table: Plan*RAG Systems and Key Features

System	Planning	Retrieval	Result/Claim	Reference
Plan*RAG	Test-time DAG	Per-node, atomic	+2–6 Acc/F1 on multi-hop QA	(Verma et al., 2024)
LevelRAG	Symbolic searcher	Hybrid (S/W/D)	Outperforms GPT4o, F1 up to 69.33%	(Zhang et al., 25 Feb 2025)
PAR RAG	JSON plan, review	Multi-granular	+31.6% EM over baseline on HotpotQA	(Zhang et al., 23 Apr 2025)
MedPlan	SOAP plan	Patient+history	BLEU up to 0.3183, 66% ↑ clinical eval	(Hsu et al., 23 Mar 2025)
ThreatLens	Multi-agent plan	Vector, iterative	75% manual effort ↓, 92% precision	(Saha et al., 11 May 2025)
PERC	Pseudocode plan	Plan-as-query	+1–5 pp Pass@1 on underrepresented PLs	(Yoo et al., 2024)
PlanRAG	Stepwise plan	SQL/Cypher gen	+15.8pp / +7.4pp accuracy on DQA	(Lee et al., 2024)
RPG	Plan token per step	Paragraph select	+8.5 F1 (2Wiki), +9.1 ROUGE (ASQA)	(Lyu et al., 2024)
P-RAG	Progressive planning	History+goal/scene	+7% SR (ALFRED: 7.05%→14.11%→27.4%)	(Xu et al., 2024)
Plan+RAG-Code	DSL plan, function	Example+API meta	+7 pts sim on OOD API DSL generation	(Bassamzadeh et al., 2024)

References

Plan*RAG: Efficient Test-Time Planning for Retrieval Augmented Generation (Verma et al., 2024)
LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers (Zhang et al., 25 Feb 2025)
Credible plan-driven RAG method for Multi-hop Question Answering (Zhang et al., 23 Apr 2025)
MedPlan: A Two-Stage RAG-Based System for Personalized Medical Plan Generation (Hsu et al., 23 Mar 2025)
ThreatLens: LLM-guided Threat Modeling and Test Plan Generation for Hardware Security Verification (Saha et al., 11 May 2025)
PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation (Yoo et al., 2024)
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative LLMs as Decision Makers (Lee et al., 2024)
Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation (Lyu et al., 2024)
Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task (Xu et al., 2024)
Plan with Code: Comparing approaches for robust NL to DSL generation (Bassamzadeh et al., 2024)

Markdown Upgrade to Chat

References (10)

Plan*RAG: Efficient Test-Time Planning for Retrieval Augmented Generation (2024)

LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers (2025)

Credible plan-driven RAG method for Multi-hop Question Answering (2025)

Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation (2024)

PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers (2024)

PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation (2024)

ThreatLens: LLM-guided Threat Modeling and Test Plan Generation for Hardware Security Verification (2025)

MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation (2025)

P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task (2024)

10.

Plan with Code: Comparing approaches for robust NL to DSL generation (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Plan*RAG.

Plan*RAG: Structured Planning in RAG Pipelines

1. Core Principles and Motivations

2. Representative Plan*RAG Architectures

3. Formal Models and Algorithmic Structures

4. Domain-Specific Adaptations and Applications

5. Theoretical and Empirical Impact

6. Limitations, Challenges, and Future Directions

7. Summary Table: Plan*RAG Systems and Key Features

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Plan*RAG: Structured Planning in RAG Pipelines

1. Core Principles and Motivations

2. Representative Plan*RAG Architectures

3. Formal Models and Algorithmic Structures

4. Domain-Specific Adaptations and Applications

5. Theoretical and Empirical Impact

6. Limitations, Challenges, and Future Directions

7. Summary Table: Plan*RAG Systems and Key Features

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research