- The paper presents PreAct, which enhances planning by integrating future observation predictions into the LLM agent loop.
- It demonstrates improvements over ReAct with up to 20% success rate gains and increased reasoning diversity across various tasks.
- The framework shows additive benefits when combined with Reflexion, paving the way for more robust LLM agent training.
PreAct: Integrating Prediction into Agent Planning with LLMs
Motivation and Background
LLMs have amplified agent capabilities for stepwise reasoning and interactive problem-solving across real-world environments. ReAct [yao2022react] operationalizes this by prompting LLMs to iteratively generate thoughts and actions informed by ongoing observations, building a trajectory that sequentially refines plans. While ReAct successfully enables problem decomposition and environmental adaptation, its reasoning pathways are often narrowly causal, limiting strategy diversity and adaptability in complex scenarios.
Recent approaches such as Tree-of-Thought (ToT) [yao2023tree] and Graph-of-Thought (GoT) [besta2023graph] address this by generating a multiplicity of potential actions at every decision step. These paradigms, however, face operational challenges in real-world environments where executing divergent actions in parallel is infeasible. Reflexion [shinn2023reflexion] further enriches agent memory by triggering post hoc reflection on failures, updating history to mitigate repeated mistakes, yet does not directly address prospective prediction for planning correction.
PreAct Framework
PreAct introduces explicit future observation prediction into the LLM agent planning loop. At each step, in addition to generating thought and action, the agent predicts possible observations and corresponding contingencies. This prediction is integrated into the agent's history and leveraged for future reasoning, enabling the agent to compare actual observations against predictions and reflect on mismatches. The framework supports three history retention modes:
- Permanent Mode: All predictions are retained in the agent’s episodic memory.
- Immediate Mode: Only the latest prediction is preserved.
- Reflexion Mode: Combines Reflexion reflections and all predictions.
By prompting the LLM to anticipate future outcomes and update plans based on the discrepancies between prediction and reality, PreAct promotes a more diversified and strategically oriented reasoning process.
Experimental Analysis
PreAct is empirically evaluated using GPT-3.5 and GPT-4 across four datasets sourced from AgentBench [liu2023agentbench]: Householding (Alfworld), Operating System (OS), Database (DB), and Lateral Thinking Puzzles (LTP). The metrics focus on Success Rate (SR) and a normalized guessing percentage for LTP.
Comparative Results
PreAct consistently outperforms ReAct in complex task settings across all examined datasets. In the Alfworld Householding task, PreAct yields a ≈20% absolute improvement over ReAct, with additional advances observed in OS and DB tasks (up to 12% and 8% in various settings respectively). The integration of Reflexion with PreAct drives further performance gains, suggesting additive benefits from both retrospective self-assessment and prospective prediction. In the LTP domain, PreAct’s advantage is less pronounced, plausibly impeded by GPT’s refusal mechanisms triggered by toxic inputs.
Diversity and Strategic Directivity
Explicit trajectory-level evaluation reveals that PreAct substantially increases reasoning diversity—on average, 45% of instances show higher diversity than ReAct, and strategic directivity metrics (using scoring prompts) indicate at least a 20% edge in directional planning by PreAct agents. Case studies substantiate that PreAct not only recovers from initial mistakes faster (e.g., correcting DB queries or switching search locations in Alfworld) but also generalizes plans more robustly by leveraging future prediction signals.
Role of Prediction History
Ablation analyses demonstrate that longer-term retention of prediction history correlates strongly with task success rates, with persistent prediction histories providing sustained improvements in reasoning efficacy (e.g., success rate increased from 66% to 74% in Alfworld for GPT-4 as prediction history was extended). The historical prediction effect is uniformly positive except in cases where model refusal rates increase, as seen in LTP.
Theoretical and Practical Implications
PreAct operationalizes a process that better aligns LLM agent reasoning with scientific inquiry—predict, act, compare outcomes, and rectify plans. The introduction of prediction-guided reasoning creates new axes for evaluating agent planning capacity, namely reasoning diversity and directional strategy. By jointly exploiting prospectively rich prior information (predictions) and retrospectively sourced information (Reflexion), sophisticated agent behaviors emerge that surpass those engineered solely via stepwise reasoning or static memory augmentation.
Evaluating planning along diversity and strategic directivity metrics enables more nuanced reward shaping for reinforcement learning in agent training. It also sets a foundation for integrating even richer forms of long-term memory beyond Reflexion, such as example-based or experience-based memory.
Limitations and Future Directions
PreAct focuses exclusively on short-term episodic memory, primarily via history and Reflexion. Future research should investigate PreAct’s interaction with hierarchical or externalized long-term memories. Current experiments utilize only prompt-based adaptation; fine-tuning models on PreAct trajectories may elucidate further intrinsic mechanisms underlying enhanced reasoning. Additionally, while PreAct offers more robust refusals in toxic or edge-case environments, underlying LLM biases and hallucinations remain a persistent challenge.
Conclusion
PreAct extends LLM agent planning by integrating predictive reasoning into the action loop, thereby increasing both the diversity and strategic directivity of plans. Experimental results substantiate substantial and sustained improvements over ReAct, with further gains realized via synergy with Reflexion memory. The PreAct framework introduces new process-level metrics for evaluating and optimizing planning-oriented agents, laying groundwork for more advanced agent architectures and richer evaluative paradigms in future AI research (2402.11534).