Improving Planning with Large Language Models: A Modular Agentic Architecture (2310.00194v4)

Published 30 Sep 2023 in cs.AI and cs.NE

Abstract: LLMs demonstrate impressive performance on a wide variety of tasks, but they often struggle with tasks that require multi-step reasoning or goal-directed planning. Both cognitive neuroscience and reinforcement learning (RL) have proposed a number of interacting functional components that together implement search and evaluation in multi-step decision making. These components include conflict monitoring, state prediction, state evaluation, task decomposition, and orchestration. To improve planning with LLMs, we propose an agentic architecture, the Modular Agentic Planner (MAP), in which planning is accomplished via the recurrent interaction of the specialized modules mentioned above, each implemented using an LLM. MAP improves planning through the interaction of specialized modules that break down a larger problem into multiple brief automated calls to the LLM. We evaluate MAP on three challenging planning tasks -- graph traversal, Tower of Hanoi, and the PlanBench benchmark -- as well as an NLP task requiring multi-step reasoning (strategyQA). We find that MAP yields significant improvements over both standard LLM methods (zero-shot prompting, in-context learning) and competitive baselines (chain-of-thought, multi-agent debate, and tree-of-thought), can be effectively combined with smaller and more cost-efficient LLMs (Llama3-70B), and displays superior transfer across tasks. These results suggest the benefit of a modular and multi-agent approach to planning with LLMs.

PDF Abstract

Prefrontal Cortex-Inspired Planning Architecture for LLMs

The paper presents a prefrontal cortex-inspired architecture, LLM-PFC, designed to enhance planning capabilities in LLMs. Drawing parallels from the modular nature of the human prefrontal cortex (PFC), the authors propose an architecture that leverages multiple specialized GPT-4-based modules, each performing distinct functions analogous to PFC subregions. This approach is aimed at addressing the identified shortcomings of LLMs in planning tasks, notably their struggles with multi-step reasoning and goal-directed planning.

Architecture Overview

The LLM-PFC architecture includes several key modules:

TaskDecomposer: Inspired by the anterior PFC, it breaks down high-level goals into manageable subgoals.
Actor: Analogous to the dorsolateral PFC, it proposes potential actions towards achieving subgoals.
Monitor: Reflects the anterior cingulate cortex role in conflict monitoring, evaluating the validity of proposed actions.
Predictor: Based on the orbitofrontal cortex, it forecasts next states arising from actions.
Evaluator: Also inspired by the orbitofrontal cortex, assesses the value of predicted states.
TaskCoordinator: Draws on the anterior PFC's function to assess subgoal achievement.

The architecture conducts tree search leveraging these modules, performing state prediction, and evaluating the value of resulting states to optimize the planning process.

Empirical Evaluation

The LLM-PFC architecture was tested on two complex tasks: graph traversal and Tower of Hanoi (ToH).

1. Graph Traversal Tasks: Utilizing the CogEval protocol, the architecture was evaluated on tasks like Valuepath and Steppath. Notably, LLM-PFC achieved nearly perfect performance, significantly outperforming standard GPT-4 approaches. It solved 100% of Valuepath tasks without proposing invalid actions, demonstrating its efficacy in planning accuracy and efficiency.

2. Tower of Hanoi: The architecture also tackled the ToH task, achieving a nearly sevenfold improvement over zero-shot performance in solving 3-disk problems, with a noticeable proficiency in 4-disk tasks, representing a test for out-of-distribution generalization.

Ablation Study and Baselines

The paper highlights the critical roles of different modules, with the Monitor identified as particularly crucial. The absence of any single module typically resulted in reduced problem-solving efficacy, underlining the interdependent nature of the architecture's components.

Comparative analyses against zero-shot and in-context learning baseline methods using GPT-4 indicated substantial improvements in both task success rates and the reduction of invalid actions, validating the integrated multi-module approach.

Implications and Future Directions

The LLM-PFC architecture introduces a structured method for organizing planning in LLMs, suggesting significant potential for applications requiring complex reasoning. Its modular design, reflective of neurobiological insights from the PFC, opens avenues for further enhancements in artificial intelligence, particularly in tasks demanding flexible, multi-step planning capabilities.

Future research could explore joint fine-tuning of modules to further enhance performance and extend capabilities to broader task domains. Additionally, investigating white-box methods could optimize module specialization and reduce reliance on prompt engineering, paving the way for broader applicability of the architecture across diverse cognitive tasks.

Conclusion

This research suggests that biologically inspired approaches can enrich AI systems' planning and reasoning capacities by decomposing these tasks into specialized, interacting components. The LLM-PFC demonstrates promising advancements in planning efficacy for LLMs, encouraging directions in AI development that more closely mimic human cognitive architectures.