Papers
Topics
Authors
Recent
Search
2000 character limit reached

Plan-then-Execute LLM Agents

Updated 18 March 2026
  • Plan-then-Execute systems are architectures that explicitly decouple high-level planning from execution, using structured plans such as linear lists, hierarchies, or DAGs.
  • They employ multi-stage workflows with supervised fine-tuning, preference optimization, and synthetic data generation to ensure high-quality plan formation.
  • This paradigm enhances control-flow integrity, auditability, and failure recovery, making it ideal for complex, long-horizon, and high-assurance applications.

A plan-then-execute (PTE) paradigm in LLM agents refers to architectures that decouple high-level planning—producing an explicit, global or hierarchical plan—from the downstream execution of that plan, which is typically handled by a separate module or subagent. This separation is motivated by the need for interpretability, modularity, controllable reasoning, reliability, and the limitations of end-to-end or interleaved planning/acting (e.g., ReAct) protocols in complex, long-horizon, or high-assurance environments.

1. Formal Foundations and Core Architecture

The canonical PTE agent leverages a two-stage workflow: first, a planner module creates an explicit plan—a sequence (or DAG) of high-level steps or subgoals—based on a user goal or instruction; then, an executor module realizes each step, typically by grounding it to environment actions, tool calls, or subagent invocations.

Let uu denote the (possibly natural-language) user instruction. The planner πg\pi_g emits a plan

p=(s1,s2,,sm),  sinatural-language subgoal or structured schema.p = (s_1, s_2, \dots, s_m)\,,\;\quad s_i \in \text{natural-language subgoal or structured schema}\,.

For execution, an agent πθ\pi_\theta is conditioned on both uu and pp, producing action sequences: atπθ(u,p,a<t,o<t)a_t \sim \pi_\theta( \cdot \mid u, p, a_{<t}, o_{<t} ) where a<ta_{<t} and o<to_{<t} are the histories of actions and observations. The plan may be represented as a strictly linear list (Xiong et al., 4 Mar 2025), a hierarchical structure (Chen et al., 23 Apr 2025), a DAG (Zhang et al., 12 Mar 2026), a program (“blueprint”) (Qiu et al., 1 Aug 2025), or segmentable subgoal set suitable for parallel/expert worker assignment (Amayuelas et al., 2 Apr 2025, Toda et al., 11 Jan 2026). The execution phase iterates through, or schedules, these steps while updating state and potentially relaying feedback for re-planning.

This explicit decoupling stands in contrast to interleaved (e.g., ReAct (Huang et al., 2024)) or monolithic LLM-only agent architectures, which conflate reasoning and acting on a per-timestep basis.

2. Plan Representation: Abstractions and Modality

Plan representations in PTE architectures are flexible but always explicit. Common forms include:

Plans may be static (fixed before any execution) or support continuous refinement (replanned after each step or failure) (Chen et al., 23 Apr 2025, Erdogan et al., 12 Mar 2025). Advanced frameworks allow plans to specify agent, tool, or subskill invocation, and may encode dependencies (e.g., DAG structure for parallel execution) (Zhang et al., 12 Mar 2026, Toda et al., 11 Jan 2026).

3. Planning, Optimization, and Data Regimes

PTE agents require not only robust planning modules, but learning frameworks to ensure high-quality and generalizable plans.

  • Supervised Finetuning: Planners are initially trained on pairs (u,p)(u,p) or (q,z)(q, z) of instructions and reference plans using cross-entropy loss (Erdogan et al., 12 Mar 2025).
  • Preference Optimization: Meta plans are further refined using feedback from rollouts and environment rewards. Direct Preference Optimization (DPO) or similar preference-learning objectives optimize the planner to prefer plans yielding higher empirical rewards (Xiong et al., 4 Mar 2025).
  • Synthetic Data Generation: To address data scarcity—especially of high-quality plans—researchers devise pipelines that (a) collect trajectories via demonstrators, (b) annotate with LLM or programmatic plan labels, and (c) perform augmentation and targeted expansion (Erdogan et al., 12 Mar 2025).
  • Feedback Loops: Environmental or LLM-based verification signals inform meta-plan improvement and dynamic re-planning (Chen et al., 23 Apr 2025, Zhang et al., 12 Mar 2026).

Optimization objectives thus blend supervised learning, preference learning, and closed-loop fine-tuning leveraging both human-annotated and synthetic datasets.

4. Execution, Conditioned Reasoning, and Failure Recovery

Execution modules consume explicit plans and ground them into actions. Execution protocols include:

  • Conditioned prompting: The plan is inserted into the reasoning context for each action, with empirical results showing performance is sensitive to placement (e.g., instruction block vs. internal thought) (Xiong et al., 4 Mar 2025).
  • Structured execution: In deterministic or secure settings, plans are codified as source code or a blueprint, and executed stepwise by an engine that blocks on each atomic call; LLM invocations, tool calls, and conditionals are interleaved as dictated by the static plan, never the LLM at runtime (Qiu et al., 1 Aug 2025).
  • Hierarchical skill dispatch: High-level plan steps are mapped to specialized skill modules (searching, coding, writing, etc.), with each skill having a designated executor and interface (Chen et al., 23 Apr 2025).
  • Multi-agent and parallel models: Plans may specify assignment of tasks to multiple worker agents, supporting efficient, event-triggered concurrent execution (Amayuelas et al., 2 Apr 2025, Toda et al., 11 Jan 2026, Zhang et al., 12 Mar 2026).

Failure recovery is intrinsic to robust PTE. When off-nominal outcomes (errors, unexpected observations) arise:

5. Security, Interpretability, and Control

PTE patterns provide prominent architectural advantages:

  • Control-flow integrity: Fixing the plan up-front prevents tool outputs or environmental feedback from injecting unanticipated actions. This hardens against prompt-injection and non-local vulnerabilities (Rosario et al., 10 Sep 2025).
  • Least-privilege enforcement: By associating tools with plan steps, executors can be dynamically provisioned for minimal access, and sandboxed as needed (e.g., per step Docker containers) (Rosario et al., 10 Sep 2025).
  • Determinism and procedural fidelity: Encoding plans as source-code or blueprints guarantees procedural adherence, with all stochasticity confined to controlled LLM submodule invocations (Qiu et al., 1 Aug 2025).
  • Human-in-the-loop gating: PTE simplifies HITL verification—humans can approve the plan before execution or per critical step (Rosario et al., 10 Sep 2025).
  • Auditability: Traceable plan structures and execution logs facilitate analysis of failures and ground reasons for specific actions or outcomes (Toda et al., 11 Jan 2026).

6. Empirical Results and Quantitative Comparison

Empirical studies demonstrate significant benefits for PTE architectures:

Agent/Framework Success Rate / Key Gains Benchmark Notes
MPO (Xiong et al., 4 Mar 2025) +18.3 pts (zero-shot), +5% SOTA margin ScienceWorld, ALFWorld Strong generalization, reduced wasted actions
Plan-and-Act (Erdogan et al., 12 Mar 2025) Static: +6–10 pp; Dynamic: +16–20 pp WebArena-Lite Synthetic data, dynamic re-planning vital
Source Code Agent (Qiu et al., 1 Aug 2025) +10.1 pp Pass1 over baseline tau-bench Up to –82% token/tool cost
GoalAct (Chen et al., 23 Apr 2025) Mean +12.22 pp avg. LegalAgentBench Ablations: planning, searching, coding all critical
Planner, Multi-agent (Amayuelas et al., 2 Apr 2025) 5.53 vs 2.28 efficiency ratio CuisineWorld Improved agent utilization, cost efficiency
CHASE (Toda et al., 11 Jan 2026) 98.4% recall, 0.08% FPR PyPI malware (3k pkgs) LLM-coordinated multi-agent with verification
VMAO (Zhang et al., 12 Mar 2026) 3.1→4.2 completeness (+35%) Market research (25 queries) Orchestration-level verification, iterative plan

Plan-then-Execute yields (a) higher final task and subgoal success, (b) reduced execution overhead and error rate, and (c) greater generalization—particularly evident for mid-scale models and multistep or multi-agent tasks (Xiong et al., 4 Mar 2025, Erdogan et al., 12 Mar 2025, Qiu et al., 1 Aug 2025, Amayuelas et al., 2 Apr 2025, Zhang et al., 12 Mar 2026).

7. Limitations, Open Challenges, and Future Directions

Open challenges remain across several dimensions:

Future research directions include hierarchical decomposition protocols, principled re-planning strategies, uncertainty-driven human-in-the-loop controls, and scalable multi-agent orchestration in dynamic environments.


Key references: (Xiong et al., 4 Mar 2025, Erdogan et al., 12 Mar 2025, Qiu et al., 1 Aug 2025, Chen et al., 23 Apr 2025, Amayuelas et al., 2 Apr 2025, Toda et al., 11 Jan 2026, Huang et al., 2024, Wei et al., 16 Feb 2025, Rosario et al., 10 Sep 2025, Shahnovsky et al., 13 Mar 2026, Castrillo et al., 10 Oct 2025, Hu et al., 10 Feb 2026, Aghzal et al., 15 Mar 2026, Zhang et al., 12 Mar 2026, He et al., 3 Feb 2025, Wang et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Plan-then-Execute (LLM agents).