Plan-Based Generation Overview

Updated 7 April 2026

Plan-Based Generation is a methodology that decomposes complex tasks into intermediate plans, clarifying what to output and in which order.
It improves controllability, factual accuracy, and modularity by using varied plan representations like outlines, anchor tokens, and graphical models.
Empirical findings indicate reduced hallucinations and enhanced reliability, making this approach effective for structured text, data-to-text, and multi-agent systems.

Plan-Based Generation encompasses a family of methods that explicitly decompose complex generation tasks (text, code, structured data, control sequences) into intermediate representations, called plans, prior to the final realization. This paradigm enhances controllability, transparency, factual accuracy, and modularity across a wide range of tasks, including natural language generation, data-to-text systems, code generation, robotics, and multi-agent planning. Plans may take the form of outlines, anchor words, question–answer blueprints, ordered token sequences, paragraph-level instructions, or graphical control-flow models, and typically structure “what to say” and “in what order” before the final output is rendered.

1. Formal Foundations and Taxonomy

The essential principle of plan-based generation is the explicit factorization of the generative process into at least two stages: first, construction of an intermediate plan $B$ (also denoted $P$ , $z$ , or $C$ depending on context); second, realization of surface output $y$ from this plan. Typical probabilistic formalizations include: $P(B, y | x) = P(B | x) \times P(y | x, B)$ where $x$ denotes the input (which could be a user query, document set, knowledge graph, or other signal), $B$ the blueprint or plan (sequence of subgoals, questions, or content selectors), and $y$ the final output. The nature of $B$ defines the plan’s granularity: it can be a sequence of question–answer pairs (Huot et al., 2023), an ordered list of content units (Su et al., 2021), a rooted tree structure (You et al., 2023), a set of anchor tokens (Jhamtani et al., 2020), or graphical process models (Redis et al., 2024, Abe et al., 2 Apr 2025).

Archetypal plan-based generation settings include:

Conditional Text Generation: Plan as blueprint QA pairs (Huot et al., 2023), content outlines, or paragraph structures (Godbole et al., 2024, Wu et al., 26 Feb 2025), narrative trees (You et al., 2023), or anchor word sequences (Jhamtani et al., 2020).
Data-to-Text: Plans as ordered lists of table slots or facts (Su et al., 2021), or as sentence-structured graphs (Moryossef et al., 2019).
Retrieval-Augmented Generation (RAG) and Code Generation: Plans guide retrieval and in-context selection of facts or code examples via pseudocode queries (Godbole et al., 2024, Yoo et al., 2024).
Robotics and Multi-Agent Systems: Plans as temporal event sequences, simple temporal networks, or agent–status graphs (Abe et al., 2 Apr 2025, Ma et al., 2018, Muhtadin et al., 24 Dec 2025).
Skill Learning and Process Mining: Structured process models as plans for parallelism and interpretability (Redis et al., 2024).

2. Algorithmic Frameworks and Model Architectures

Variants of plan-based generation diverge with respect to plan representation, generative architecture, supervision level, and degree of interactivity.

Text Generation: Transformer encoder–decoders are standard. In “Text-Blueprint” (Huot et al., 2023), a LongT5 encoder–decoder generates QA-pair blueprints (plan $P$ 0), then uses cross-attention to realize summaries conditioned on both input and blueprint. The “Plan-then-Generate” approach (Su et al., 2021) uses a BERT-based content planner over structured input $P$ 1 to produce a plan $P$ 2, followed by a BART-based generator consuming $P$ 3 and $P$ 4. “Latent Anchor Plan” (Jhamtani et al., 2020) introduces a latent variable $P$ 5 (sequence of anchor tokens) sampled from a learned or inferred prior, conditioning generation at a per-sentence level via LSTM-based architectures, with learning handled by amortized variational inference.

Code Generation: “Plan-As-Query Example Retrieval” (PERC) (Yoo et al., 2024) operates by converting both code pool and queries into pseudocode plans, retrieving top-k in-plan examples, and assembling (description, plan, code) tuples for few-shot prompting of the code LLM.

Retrieval-Augmented Knowledge Generation: “Retrieve-Plan-Generation" (RPG) (Lyu et al., 2024) employs an explicit plan–answer loop. At each iteration, plan tokens are generated, inform fine-grained evidence selection, then the answer segment is generated, and the process repeats. Soft prompts and low-rank adapters enable multi-task conditioning (planning vs. answering) without full LLM fine-tuning.

Robotics and Control: In “LLM-mediated Multi-Agent Planning” (Abe et al., 2 Apr 2025), environmental statuses and goals are encoded as discrete symbols, with a GPT-4o-based agent generator recursively constructing ( $P$ 6, $P$ 7, $P$ 8) triplets; plans are realized as edges in a directed status–agent graph. Hierarchical decompositions (task-level then motion-level) appear in multi-robot systems (Ma et al., 2018), with simple temporal networks employed for schedule feasibility.

Skill Learning: Process mining is used to discover, store, and retrieve structured control-flow plans as reusable “skills,” augmenting LLM-based planners with the ability to parallelize and interpret action traces (Redis et al., 2024).

3. Evaluation Protocols and Empirical Findings

Comprehensive evaluation of plan-based generation spans automatic metrics, human judgments, and ablation studies.

Factuality and Faithfulness: Plan-guided generation substantially reduces hallucination rates, e.g., ROUGE-1/2/L improvement (+1.5/+1.2 over no-plan T5), hallucination reduction (–10–15 points), and human faithfulness preference (80% vs. 55% for non-plan baselines) (Huot et al., 2023, Godbole et al., 2024).
Controllability and Interpretability: The explicit plan allows direct manipulation (add/remove/edit questions, reorder content units), supporting desired rhetorical flow and forcing inclusion/exclusion of facts (Huot et al., 2023, Su et al., 2021). Human raters consistently prefer plan-based outputs for control and traceability (Huot et al., 2023, Wu et al., 26 Feb 2025).
Retrieval Coverage and Attribution: Plan-based retrieval settings show AIS_strict up to 90%, +15–25 points over one-shot retrieval; ROUGE-2 precision up ∼13 points (Godbole et al., 2024). Fine-grained evidence selection enabled by plan tokens increases precision and reduces off-topic drift (Lyu et al., 2024).
Efficiency and Scalability: Neural planners achieve several orders of magnitude speedup over exhaustive enumeration for data-to-text tasks, with linear rather than exponential scaling (Moryossef et al., 2019, Su et al., 2021).
Long-Form Generation: Plan-based sectioning leads to improved length-following (Len >0.9 at each section) and +16 points overall quality over direct generation for long-form (arXiv, Wikipedia, Blogs) (Wu et al., 26 Feb 2025).
Robotics: LLM-generated agent networks achieve 70%+ coverage of human-constructed planning graphs; naive expansion leads to combinatorial failures beyond 400 agents, suggesting a need for selective expansion heuristics (Abe et al., 2 Apr 2025).

Typical evaluation includes industry-standard metrics (BLEU, ROUGE, METEOR, BERTScore F1, self-BLEU/diversity, PARENT), human or LLM-based correctness and preference scoring, plan-adherence, and coverage/consistency checks tailored to application.

4. Design Variations and Use-Case Specialization

Plan-based generation is instantiated with diverse plan representations, supervision schemes, and downstream realization modules.

Plan Granularity: Section- or paragraph-level plans (summaries or outlines) for long-form texts (Wu et al., 26 Feb 2025); sentence-level anchor tokens for stories (Jhamtani et al., 2020); graph/tree/dependency structures for data-to-text (Moryossef et al., 2019, Su et al., 2021, You et al., 2023).
Supervision: Ranges from manual plan annotation or extraction (QA-based iterative plan extraction in EIPE-text (You et al., 2023)) to unsupervised, variationally-induced plans (Jhamtani et al., 2020).
Domain Inputs: Inclusion of retrieval pipelines for knowledge grounding (web search/bi-encoder/cross-encoder stacks), especially in code generation (Yoo et al., 2024), clinical decision support (Hsu et al., 23 Mar 2025), or scientific writing (Godbole et al., 2024).
User Interactivity: Web-based UIs for plan visualization and editing (Text-Blueprint (Huot et al., 2023)); multi-stage plan–edit–generate loops.
Architectural Control: From sequence-to-sequence Transformers with concatenated or prefixed plan tokens (Huot et al., 2023, Su et al., 2021) to hybrid systems with structured prompts or module chaining.

5. Applications and Impact Across Domains

Plan-based generation enables transparent, configurable, and robust content creation in tasks requiring factual accuracy, procedural correctness, or human-AI collaboration.

Text Summarization & Report Generation: Query-focused multi-document summarization, legal/scientific reporting, interactive editorial workflows (Huot et al., 2023).
Long-Form and Narrative Construction: Hierarchical planning yields more coherent and globally consistent novels, essays, and presentations (You et al., 2023, Wu et al., 26 Feb 2025).
Data-to-Text Realization: Explicit content plans allow rhythmically or factually controlled output from knowledge graphs or tables (Moryossef et al., 2019, Su et al., 2021).
Code Generation: Plan-based retrieval and example selection outperform RAG baselines, especially in low-resource languages or cross-domain transfer (Yoo et al., 2024).
Autonomous Robotics: LLM-derived plans as structured agent graphs or simple temporal networks scale up multi-agent path planning, dynamic task decomposition, and real-world navigation (Abe et al., 2 Apr 2025, Ma et al., 2018, Muhtadin et al., 24 Dec 2025).
Clinical Decision-Support: Sequential plan architectures enhance medical reasoning workflows, aligning output with SOAP standards and integrating historical patient context (Hsu et al., 23 Mar 2025).
Process Automation and Skill Learning: Control-flow process models mined from plan traces bring parallelism, interpretability, and retrieval to LLM-based planners (Redis et al., 2024).

6. Limitations, Challenges, and Future Directions

Principal open issues in plan-based generation concern the following:

Plan Quality Dependency: The overall output fidelity depends on accurate and relevant initial plan generation; suboptimal plans propagate errors downstream (Huot et al., 2023, Godbole et al., 2024, Su et al., 2021).
Latency and Complexity: Two-stage (or multi-stage) reasoning introduces additional computational steps. Iterative refinement and evidence selection can increase latency compared to direct generation (Huot et al., 2023, Lyu et al., 2024).
Error Propagation: Early planning errors are persistent, especially in non-interactive or non-adaptive systems (Huot et al., 2023).
Scalability and Combinatorial Explosion: Unconstrained expansion in symbolic or graph-based planning leads to combinatorial intractability in multi-agent or high-dimensional domains (Abe et al., 2 Apr 2025).
Plan Representation Selection: The optimal granularity, structure, and induction method for plans remain highly context-dependent. Hierarchical or mixed-initiative planning and continual plan correction are active research areas (Huot et al., 2023, You et al., 2023, Wu et al., 26 Feb 2025).
Generalization and Domain Adaptability: Some advances—especially in plan-as-query retrieval for code (Yoo et al., 2024) and clinical-plan composition (Hsu et al., 23 Mar 2025)—highlight strong cross-domain transfer, but robustness when mixing plan pools or scaling plan libraries requires further investigation.
Integration with Retrieval and Knowledge: Coupling plan-based decomposition with dynamic document retrieval, fine-grained evidence selection, and high-accuracy attribution is an ongoing challenge (Godbole et al., 2024, Lyu et al., 2024).

Anticipated future work aims to develop hierarchical and mixed-initiative planning (both model- and user-driven), richer plan formalisms (semantic/discourse/graph-based), process-aware retrievers, interleaved retrieval–plan–generation loops, and real-world applications in collaborative and autonomous systems.

References

(Huot et al., 2023) Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation (Godbole et al., 2024) Analysis of Plan-based Retrieval for Grounded Text Generation (Su et al., 2021) Plan-then-Generate: Controlled Data-to-Text Generation via Planning (Moryossef et al., 2019) Improving Quality and Efficiency in Plan-based Neural Data-to-Text Generation (Jhamtani et al., 2020) Narrative Text Generation with a Latent Discrete Plan (You et al., 2023) EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation (Wu et al., 26 Feb 2025) LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm (Lyu et al., 2024) Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation (Yoo et al., 2024) PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation (Abe et al., 2 Apr 2025) LLM-mediated Dynamic Plan Generation with a Multi-Agent Approach (Ma et al., 2018) Overview: A Hierarchical Framework for Plan Generation and Execution in Multi-Robot Systems (Redis et al., 2024) Skill Learning Using Process Mining for LLM Plan Generation (Hsu et al., 23 Mar 2025) MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation (Muhtadin et al., 24 Dec 2025) Quadrupped-Legged Robot Movement Plan Generation using LLM