Plan-and-Act Framework
- Plan-and-Act Framework is a modular agent design that explicitly separates high-level planning from low-level execution to manage complex, long-horizon tasks.
- It underpins diverse applications from LLM-based agents to hierarchical reinforcement learning, leveraging structured plans, dynamic replanning, and modular execution.
- Empirical studies show that Plan-and-Act systems outperform reactive models, delivering improved success rates and robust adaptation in dynamic, uncertain environments.
The Plan-and-Act framework refers to a family of agent architectures and methodological paradigms that explicitly structure decision-making as the coordinated interplay of two primary phases: planning—formulating one or more high-level, multi-step, or hierarchical strategies—and acting—executing low-level, often environment-specific actions guided or constrained by those strategies. This paradigm is motivated by limitations in both purely reactive approaches and monolithic end-to-end policies, as well as by the need for scalability, interpretability, and robustness in complex, long-horizon, or open-ended tasks. It spans cognitive robotics, LLM agents, retrieval-augmented reasoning, hierarchical reinforcement learning, and automated reasoning domains.
1. Core Principles and Formalization
Central to Plan-and-Act designs is the explicit separation (and often iterative interleaving) of planning and acting modules. Typically, the architecture comprises a Planner component that ingests a task specification (e.g., user query, goal state, environment snapshot) and outputs a structured, high-level plan—an ordered list of steps, subgoals, or control-flow logic—and an Executor or Actor that operationalizes plan steps into concrete, environment-altering actions.
Formally, a broad class of Plan-and-Act agents can be characterized by the pipeline:
- Given a context , a Planner policy produces plan or a sequence of steps (Erdogan et al., 12 Mar 2025, Rawat et al., 15 May 2025).
- An Executor policy , possibly conditioned on and current environment state, selects actions , updating context and possibly triggering replanning or adaptation (Erdogan et al., 12 Mar 2025, Yao et al., 2022).
In dynamic or partially observable environments, Plan-and-Act agents may incorporate replanning based on observed discrepancies between anticipated and actual outcomes, as in belief-space search for robot localization (Colledanchise et al., 2020) or in LLM-based navigation with dynamic plan correction (Erdogan et al., 12 Mar 2025, Rawat et al., 15 May 2025).
2. Representative Instantiations
The Plan-and-Act paradigm encompasses a variety of concrete frameworks and agent systems, including but not limited to:
A. LLM-based Agents for Long-Horizon and Web Tasks:
The "Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks" system separates high-level planning (via a Planner model) from low-level execution (via an Executor), where the Planner generates plans in natural language steps and the Executor maps these plans to web actions (e.g., DOM interaction). Training uses large-scale synthetic plan-trajectory data, and incorporates dynamic replanning upon environment feedback (Erdogan et al., 12 Mar 2025). Ablation demonstrates substantial performance increases (from 9.85% to 53.94% success rate on WebArena-Lite) as more sophisticated planning, data, and replanning strategies are included.
B. Prediction-Reasoning-Action Loops:
PreAct augments ReAct-style agents by incorporating explicit prediction of possible feedback for actions before acting. By reasoning over predicted outcomes, agents gain efficiency and improved planning diversity for complex tasks (Fu et al., 2024).
C. Dual-Controller and Hierarchical Frameworks:
PoAct introduces decoupled policy and action control: a Policy Controller governs the reasoning phase ("Plan", "Thought", "Code"), and an Action Controller dynamically prunes the action/tool space to reduce token cost and focus execution (Yuan et al., 13 Jan 2025). Hierarchical structures such as CoAct further generalize this, with a global planning agent responsible for macro-task decomposition and local agents responsible for executing subtasks and providing feedback for global replanning (Hou et al., 2024).
D. RL and World-Model Approaches:
"Thinker" wraps RL agents in a model-based augmentation where agents learn to plan (via model interaction) and act (via real execution) without hand-coded planning algorithms, using UCT-style rollouts and learned world representations (Chung et al., 2023). Deliberative frameworks (Patra et al., 2020) use hierarchical operational models enabling bidirectional interleaving of planning and acting, with online Monte-Carlo planning and learning heuristics.
E. Pseudocode-based Plan Synthesis:
PseudoAct leverages LLMs to generate global pseudocode plans that encode sequence, loops, and control flow for execution, supporting complex workflows, explicit termination conditions, and data dependencies. This prevents tool-use redundancy and ensures plan coherence, dramatically outperforming reactive baselines (e.g., 88.24% vs. 60.78% accuracy on FEVER) (Yihan et al., 27 Feb 2026).
F. Evaluation Frameworks:
The Agent GPA framework decomposes agent evaluation into five Plan-and-Act dimensions: Goal Fulfillment, Logical Consistency, Execution Efficiency, Plan Quality, and Plan Adherence, each scored independently by LLM judges or humans to localize error modes and guide targeted improvements (Jia et al., 9 Oct 2025).
3. Methodological Components
3.1 Planning Mechanisms
- Structured, Natural-Language, or Code-Based Planning: Planners output stepwise instructions, subgoals, or executable pseudocode (Erdogan et al., 12 Mar 2025, Yihan et al., 27 Feb 2026, Rawat et al., 15 May 2025).
- Global Hierarchical Plans: Macro decomposition into subtasks, with possible feedback for replanning (Hou et al., 2024).
- Probabilistic or World-Model Based Anticipation: Plans account for predicted failures or uncertainties, e.g., via LLM-predicted failure scenarios or probabilistic human behavior models (Dash et al., 23 Feb 2026).
3.2 Execution and Acting
- Action Priming and Pruning: Execution modules map plan steps to minimal, focused low-level actions, often with context-filtered tool or API selection (Yuan et al., 13 Jan 2025, Yihan et al., 27 Feb 2026).
- Dynamic Replanning and Self-Correction: Executors detect divergence between anticipated and actual feedback, adapt plans, and prevent infinite loops or redundant steps (Erdogan et al., 12 Mar 2025, Rawat et al., 15 May 2025, Yihan et al., 27 Feb 2026).
- Human-Task Integration: In human-robot collaboration, timing for when to intervene (“passive” vs. “active” engagement) is handled through a two-stage LLM policy query informed by human activity state and action context (Sasabuchi et al., 1 Apr 2025).
3.3 Interaction Protocols and Modularization
- Planner–Executor Interfaces: Clear separation enables modular training and optimization, e.g., LLM-based Planner trained on synthetic plans, Executor trained on action traces (Erdogan et al., 12 Mar 2025).
- Global–Local Communication: Hierarchical feedback channels allow local agents to signal failures, triggering global replanning or plan revision (Hou et al., 2024).
- External Plan Injection and Steering: Some frameworks (e.g., (Paglieri et al., 3 Sep 2025)) allow human-written plans to set agent trajectory.
4. Comparative Performance and Evaluation
Empirical validation demonstrates that Plan-and-Act frameworks systematically outperform purely reactive, end-to-end, or CoT-only systems on long-horizon planning, compositional reasoning, and complex action environments. For example:
| Framework | Benchmark | Metric | Best Plan-and-Act | ReAct or Baseline | Δ (%) |
|---|---|---|---|---|---|
| "Plan-and-Act" (Erdogan et al., 12 Mar 2025) | WebArena-Lite | Success Rate | 53.94 | 36.36 (ReAct) | +48 |
| PseudoAct (Yihan et al., 27 Feb 2026) | FEVER | Accuracy | 88.24 | 60.78 (ReAct) | +45 |
| CoAct (Hou et al., 2024) | WebArena (Avg.) | Success Rate | 16.0 | 9.4 (ReAct) | +70 |
| Pre-Act (Rawat et al., 15 May 2025) | Almita (turn-level AR) | Action Recall | 0.9238 | 0.4430 (GPT-4 ReAct) | +108 |
| PoAct (Yuan et al., 13 Jan 2025) | LegalAgentBench (all) | Success Rate | 85.6 | 59.5 (ReAct) | +44 |
Ablation studies consistently attribute gains to explicit planning, modular execution, and dynamic replanning. Limiting factors include coverage of synthetic data for planners, misalignment between plan granularity and environment stochasticity, and context window/token cost for long plan/action histories.
5. Extensions and Domain-Specific Adaptations
Plan-and-Act principles are incorporated in diverse application domains:
- Web and API orchestration (Erdogan et al., 12 Mar 2025, Yihan et al., 27 Feb 2026)
- Interactive educational agents (e.g., CyberJustice Tutor with Think–Plan–Act cognitive cycle, dynamic scaffolding, and verified curriculum retrieval (Wang et al., 19 Mar 2026))
- Human–robot interaction (Sasabuchi et al., 1 Apr 2025, Dash et al., 23 Feb 2026)
- Embodied multimodal agents (vision–language–action) (Huang et al., 22 Jul 2025)
- Fact verification and compositional QA using plan-driven RAG with multi-granularity verification (Zhang et al., 23 Apr 2025)
- Hierarchical RL and planning via learned operational models (Patra et al., 2020, Chung et al., 2023)
Adaptations include explicit plan representation languages (e.g., JSON, pseudocode, FOL-based), planner/actor modularization, scaffolding via retrieval modules, and plug-in reward shaping for anticipation or correction.
6. Limitations and Future Directions
Current Plan-and-Act frameworks face several open challenges:
- Memory and Generalization: Rare or out-of-distribution environments may break planner or executor policies (Erdogan et al., 12 Mar 2025).
- Token and Compute Efficiency: Balancing planning frequency and action cost (Paglieri et al., 3 Sep 2025, Yihan et al., 27 Feb 2026); dynamic planning allocation is being explored as an RL objective.
- Failure Handling: Dynamic replanning mitigates but does not eliminate error propagation due to environment or action stochasticity.
- Evaluation Metrics: Recent work formalizes multidimensional agent evaluation (Goal Fulfillment, Plan Quality, Adherence) to guide further improvement (Jia et al., 9 Oct 2025).
- Integration with Retrieval and Adaptation Modules: Ongoing work targets seamless integration of retrieval-augmented generation, memory, and meta-cognitive feedback loops (Wang et al., 19 Mar 2026, Zhang et al., 23 Apr 2025).
Suggested improvements include scaling plan representations (multi-modal, multi-agent), leveraging RL for end-to-end Planner/Executor optimization, and extending evaluation frameworks to cover robustness, safety, and reference-free plan assessment.
7. Theoretical Foundations
Several Plan-and-Act systems are underpinned by formal analyses:
- Termination and Soundness: PseudoAct enforces plan termination through explicit loop predicates and hard iteration caps (Yihan et al., 27 Feb 2026).
- Belief-Space Completeness: Robot localization frameworks guarantee probabilistic completeness and soundness under belief refinements (Colledanchise et al., 2020).
- Convergence of Monte-Carlo Planning: UCT-style planners for hierarchical operational models admit asymptotic convergence to optimal choices under standard assumptions (Patra et al., 2020).
- Generalization Guarantees: Representation-learning over logical target languages yields manifest out-of-sample generalization by construction (Geffner, 2021).
These foundations enable robust Plan-and-Act agents suitable for deployment in varied, dynamic, multi-step environments, and suggest promising directions for unified algorithmic design, analysis, and evaluation across cognitive AI and agentic systems.