Papers
Topics
Authors
Recent
2000 character limit reached

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

Published 22 Feb 2025 in cs.AI and cs.CL | (2502.16111v1)

Abstract: Recent agent frameworks and inference-time algorithms often struggle with complex planning problems due to limitations in verifying generated plans or reasoning and varying complexity of instances within a single task. Many existing methods for these tasks either perform task-level verification without considering constraints or apply inference-time algorithms without adapting to instance-level complexity. To address these limitations, we propose PlanGEN, a model-agnostic and easily scalable agent framework with three key components: constraint, verification, and selection agents. Specifically, our approach proposes constraint-guided iterative verification to enhance performance of inference-time algorithms--Best of N, Tree-of-Thought, and REBASE. In PlanGEN framework, the selection agent optimizes algorithm choice based on instance complexity, ensuring better adaptability to complex planning problems. Experimental results demonstrate significant improvements over the strongest baseline across multiple benchmarks, achieving state-of-the-art results on NATURAL PLAN ($\sim$8%$\uparrow$), OlympiadBench ($\sim$4%$\uparrow$), DocFinQA ($\sim$7%$\uparrow$), and GPQA ($\sim$1%$\uparrow$). Our key finding highlights that constraint-guided iterative verification improves inference-time algorithms, and adaptive selection further boosts performance on complex planning and reasoning problems.

Summary

  • The paper presents a novel multi-agent framework, PlanGEN, that generates planning and reasoning trajectories using constraint, verification, and selection agents.
  • It employs a dynamic algorithm selection mechanism based on a modified Upper Confidence Bound policy to adapt inference strategies according to instance complexity.
  • Experimental results demonstrate state-of-the-art performance with superior Exact Match and accuracy scores on various benchmarks, validating its practical impact on complex problem solving.

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

Introduction

The paper introduces PlanGEN, a model-agnostic multi-agent framework designed to generate effective planning and reasoning trajectories for complex problem solving. PlanGEN addresses limitations in current agent frameworks and inference-time algorithms that struggle with verifying generated plans and adapting to varying instance-level complexity. The proposed framework decomposes the planning task using three specialized agents: constraint, verification, and selection agents, each contributing to enhance the performance of inference-time algorithms within the framework. The experimental results demonstrate that PlanGEN achieves state-of-the-art (SOTA) results on several benchmarks, establishing its efficacy in handling complex planning problems (2502.16111).

Methodology

PlanGEN Framework

PlanGEN employs a multi-agent approach with three key components:

  • Constraint Agent: Extracts instance-specific constraints such as task rules and environmental limits, laying the groundwork for verification and selection processes.
  • Verification Agent: Uses a constraint-guided iterative verification process, assigning reward scores to plans based on their adherence to constraints. This agent is critical for improving the quality of inference-time decisions.
  • Selection Agent: Dynamically selects the most suitable inference algorithm based on instance complexity using a modified Upper Confidence Bound (UCB) policy, enhancing adaptability to complex problems.

PlanGEN integrates these agents with three inference-time algorithms: Best of N\mathcal{N}, Tree-of-Thought (ToT), and REBASE, allowing it to achieve superior performance across various problem domains. Figure 1

Figure 1: Schematic representation of PlanGEN (Mixture of Algorithms). An initial plan and constraints guide iterative plan refinement.

Experimental Results

PlanGEN was evaluated on multiple benchmarks, significantly outperforming existing baselines. The experimental setup included tasks from NATURAL PLAN, OlympiadBench, GPQA, and DocFinQA, assessed using metrics such as Exact Match (EM) and accuracy.

  • NATURAL PLAN: PlanGEN achieved the highest EM scores across calendar scheduling, meeting planning, and trip planning tasks, indicating its robustness in natural language planning.
  • OlympiadBench: Showed superior performance in mathematical and physics reasoning tasks, evidencing its capability to handle abstract and complex reasoning challenges.
  • GPQA and DocFinQA: PlanGEN demonstrated substantial improvements, especially in financial and scientific reasoning, highlighting its versatility and model-agnostic nature. Figure 2

    Figure 2: Performance comparison of inference-time algorithms across different complexity levels for meeting and trip planning from NATURAL PLAN.

Analysis and Discussion

The analysis reinforces the importance of each component in PlanGEN:

  • Verification Agent plays a crucial role in differentiating successful and unsuccessful plans by assigning precise reward values based on constraint adherence.
  • Selection Agent effectively adapts to varying problem complexities by selecting appropriate algorithms, underscoring the significance of instance-specific evaluation in complex planning scenarios.

A notable finding is the superior performance of the Mixture of Algorithms framework, demonstrating how dynamic algorithm selection significantly enhances problem-solving efficiency.

Conclusion

PlanGEN establishes a novel approach to planning and reasoning by leveraging multi-agent collaboration and adaptive inference-time strategies. The framework's ability to achieve SOTA across diverse benchmarks validates its design and efficacy. Future developments may explore integrating reinforcement learning for dynamic strategy optimization and extending its applicability to multi-modal and multi-lingual contexts, broadening its impact in AI research (2502.16111).

Paper to Video (Beta)

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 18 likes about this paper.