PreAct Framework: Multi-Step LLM Agents
- PreAct framework is a multi-step planning system for LLM agents that decouples planning from acting to ensure global coherence over long-horizon tasks.
- It employs an iterative process where an initial detailed plan is dynamically refined using tool feedback and predictive reasoning for enhanced adaptability.
- Its modular design—integrating planner, executor, refiner, and finalizer—yields significant performance gains in action recall and goal completion compared to ReAct.
The Pre-Act (PreAct) framework designates a family of LLM-centric agent architectures that augment typical reasoning–action cycles with explicit, plan-first multi-step reasoning and dynamic re-planning. Pre-Act enables LLM agents to maintain global coherence over long-horizon tasks by first drafting an explicit, stepwise execution plan—each step paired with detailed reasoning—then iteratively refining this plan as observations from tool calls or environment feedback accumulate. The paradigm originated as a response to limitations in the ReAct (Reasoning + Action) approach, where the agent interleaves a single Thought and Action per turn, which can fail on complex, multi-step or out-of-domain scenarios. Subsequent Pre-Act variants add prediction and strategic handling, supporting more robust and adaptive agentic behavior (Rawat et al., 15 May 2025, Fu et al., 2024).
1. Motivation and Distinguishing Features
Pre-Act extends ReAct by separating planning and acting: rather than reasoning only about the immediate next step, the agent first generates a structured multi-step execution plan based on the input request, with each step () comprising both "thought" () and action (). This plan is not static; after each step is executed and tool outputs or feedback () are received, the agent revisits and refines the remainder of the plan in light of new evidence, updating both future actions and their supporting rationale.
Key features:
- Global multi-step coherence: The upfront plan encourages a high-level perspective, enabling the agent to preserve long-horizon goals, rather than devolving into myopic local reasoning.
- Incremental refinement: Each tool interaction result becomes part of an accumulating context, supporting error correction and adaptive strategies.
- Data-efficient learning: Fine-grained, stepwise plan-annotation enables even smaller LLMs to learn explicit structured reasoning from trajectories.
- Prediction-enriched reasoning: In some variants, agents explicitly predict possible feedback or world responses at each step, using these predictions to anticipate contingencies and enrich follow-up planning (Fu et al., 2024).
2. Formal Multi-Step Planning and Updating Loop
Formally, given a user request , the Pre-Act agent generates a plan
and denotes the final answer step. The agent maintains an internal context
and updates
after execution and observation. At each iteration, the agent computes the next state: and upon obtaining , refines remaining steps: The high-level control loop can be summarized as:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
def PreActAgent(request x): C = [] P = LLM_Plan(x) # draft S = [s_1,...,s_n,s_fa] for i in range(1, n + 1): phi_i, a_i = P[i] if is_tool_call(a_i): o_i = CALL_TOOL(a_i) else: o_i = None C.append((a_i, o_i)) P[i+1:] = LLM_Refine(x, C, P[i+1:]) phi_fa, a_fa = P[fa] return a_fa |
3. Modular Architecture and Data Flow
Pre-Act decomposes into four principal modules:
- Planner: Produces an initial draft plan (steps ...) from the input request.
- Executor: Executes steps, triggering tool APIs or environment actions as specified, capturing observations.
- Reasoning Refiner: Incorporates the updated context at each step, invoking re-planning routines such that subsequent steps adapt to actual outcomes (e.g., tool errors, environment changes).
- Finalizer: Aggregates reasoning chains and tool outputs to produce the agent's final answer.
Data Flow Table:
| Step | Input | Output |
|---|---|---|
| Planner | Request | Draft |
| Executor | Steps , accumulated context | Action execution, observation |
| Refiner | Context , remaining plan | Refined draft steps |
| Finalizer | Complete context, thoughts, tool outputs | Final answer |
The data flow sequence is: input request → Planner (drafts plan) → Executor (steps, feedback) → Refiner (plan updates) → Finalizer (answer).
4. Evaluation Framework: Turn-Level and End-to-End
To comprehensively quantify agent performance, Pre-Act introduces a two-level evaluation framework:
Level 1 – Turn-Level: Measures stepwise competence.
- Action Recall: Proportion of predicted tool calls/final answers matching ground truth:
- Tool-Call F1: Precision/recall/F1 for tool-call parameter matching (where applicable).
- Final Answer Similarity: F1 and semantic similarity (via sentence embeddings) on natural language answers.
Level 2 – End-to-End: Assesses holistic scenario success.
- Tasks are factored into milestones with directed dependencies.
- Goal Completion Rate (GC): Fraction of milestones completed in valid order,
- Progress Rate (PR): Length of the longest valid prefix of milestones, normalized by .
This dual-level evaluation allows disentanglement of local missteps from global task planning failures (Rawat et al., 15 May 2025).
5. Comparative Empirical Results
Pre-Act demonstrates substantial improvements over ReAct and, when paired with finely tuned smaller LLMs, can outperform even flagship models such as GPT-4 on benchmark datasets:
| Model | Action Recall (Almita) | Goal Completion (GC) | Progress Rate (PR) |
|---|---|---|---|
| ReAct (avg; 5 LLMs) | ≈ 0.35 | -- | -- |
| Pre-Act (avg; 5 LLMs) | ≈ 0.60 | -- | -- |
| Llama 70B (Pre-Act, fine-tuned) | 0.8861 | 0.82 | 0.89 |
| GPT-4 (Pre-Act) | 0.8201 | 0.64 | 0.76 |
| GPT-4 (ReAct) | -- | 0.32 | 0.50 |
- Turn-level: Pre-Act outperforms ReAct by ≈ 70% in Action Recall (avg. across five models; Almita dataset).
- Llama 70B (fine-tuned with Pre-Act): Provides an absolute 69.5% recall improvement over GPT-4 on Almita single-turn action prediction and surpasses GPT-4 (Pre-Act) by +28% in end-to-end Goal Completion.
- Smaller models (Llama 8B, fine-tuned): Approach SOTA Action Recall (≈ 0.827), at ~2× speed and ~5× cost reduction compared to Llama 70B.
6. Prediction-Augmented Pre-Act Variants
Another PreAct formulation (Fu et al., 2024) augments the classic planning-act loop with a prediction step. At each reasoning turn, the agent anticipates possible environmental or tool feedbacks before committing to an action (“PREDICTED FEEDBACK”). This prediction is not directly executed but serves to expand reasoning alternatives:
- Supports agent self-reflection by explicitly enumerating possible contingencies.
- Enables downstream handling plans to be conditioned on each prediction, improving diversity and strategic orientation. Empirical analysis reveals that providing LLM agents with historical predictions consistently augments planning scope and action effectiveness.
Illustrative block:
THOUGHT: The fridge has no lettuce, so my assumption about its storage was incorrect. … ACTION: open pantry door PREDICTED FEEDBACK: 1. “You see a head of lettuce…” — Handling: pick up … 2. “There is no lettuce here…” — Handling: reflect, search elsewhere 3. “The pantry door is stuck…” — Handling: grab tool, retry
This approach enhances ReAct by injecting deliberate counterfactual reasoning, supporting richer agent capability on intricate tasks.
7. Practical Tuning, Latency, and Deployment
Pre-Act’s increased up-front and ongoing token generation for explicit multi-step plans impacts inference cost and latency, particularly for large models. Deployment optimization strategies:
- Model selection: Smaller models (Llama 8B fine-tuned for Pre-Act) significantly reduce latency (2×) and cost (5×), with only minor performance degradation.
- Fine-tuning protocol: Use curriculum learning: pretrain on ReAct-formatted corpora, then incrementally introduce high-quality Pre-Act trajectories via parameter-efficient LoRA, minimizing catastrophic forgetting.
- Prompt engineering: Structure inputs to require enumerated “Thought:” and “Action:” pairs, mandate explicit plan refinement, and maintain context accumulation.
- Plan sampling: For robust execution, generate several candidate plans and select via a self-consistency criterion (“beam-width”).
- Milestone graphing: Precompute scenario milestone DAGs (with human validation) to ensure deterministic, reproducible end-to-end evaluation (Rawat et al., 15 May 2025).
Pre-Act thus delivers an architecture for agentic LLMs that bridges generalization gaps in out-of-domain tasks, yielding evaluation gains on both stepwise and end-to-end axes, and enabling resource-efficient deployment in practical workflows.