LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

Published 30 Sep 2024 in cs.RO, cs.AI, cs.CV, cs.LG, and cs.MA | (2409.20560v2)

Abstract: LLMs (LMs) possess a strong capability to comprehend natural language, making them effective in translating human instructions into detailed plans for simple robot tasks. Nevertheless, it remains a significant challenge to handle long-horizon tasks, especially in subtask identification and allocation for cooperative heterogeneous robot teams. To address this issue, we propose a LLM-Driven Multi-Agent PDDL Planner (LaMMA-P), a novel multi-agent task planning framework that achieves state-of-the-art performance on long-horizon tasks. LaMMA-P integrates the strengths of the LMs' reasoning capability and the traditional heuristic search planner to achieve a high success rate and efficiency while demonstrating strong generalization across tasks. Additionally, we create MAT-THOR, a comprehensive benchmark that features household tasks with two different levels of complexity based on the AI2-THOR environment. The experimental results demonstrate that LaMMA-P achieves a 105% higher success rate and 36% higher efficiency than existing LM-based multiagent planners. The experimental videos, code, datasets, and detailed prompts used in each module can be found on the project website: https://lamma-p.github.io.

Abstract PDF HTML Upgrade to Chat

Summary

The paper demonstrates LaMMA-P's novel integration of language models with PDDL planning to efficiently allocate subtasks in multi-agent systems.
The methodology leverages heuristic search and probabilistic models to translate human instructions into PDDL tasks, achieving a 105% improved success rate.
The results indicate significant gains in efficiency (36% improvement) and robustness in long-horizon task planning compared to previous baselines.

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

Introduction

LaMMA-P addresses the problem of long-horizon task allocation in multi-robot systems, leveraging the reasoning prowess of LLMs (LMs) with a heuristic Planning Domain Definition Language (PDDL) framework to optimize task execution. Traditional multi-agent planning solutions struggle with the intricacies of extended task horizons, particularly with heterogeneous robot teams. By using LMs, LaMMA-P effectively translates human instructions into structured task plans while efficiently allocating subtasks across a fleet of cooperative agents.

Figure 1: A typical multi-agent long-horizon in a household scenario where Robot 1 and Robot 2 collaborate to execute tasks based on human commands given in natural language.

LaMMA-P's Modular Architecture

The framework is structured around six key modules:

Precondition Identifier (P): Assesses initial task requirements, decomposing tasks into subtasks by identifying dependencies and preconditions.
Task Allocator: Distributes subtasks across robots leveraging their capabilities, optimizing parallel execution where feasible.
Problem Generator (G): Translates task descriptions into PDDL formatted problems, encapsulating initial conditions and goals.
PDDL Validator (V): Ensures generated PDDL problems are structurally sound and executable within the simulation environment.
Fast Downward/LLM Planner: Converts PDDL problems into executable plans using heuristic search or adapts plans via LMs for more flexible task scenarios.
Sub-Plan Combiner: Synthesizes individual plans into a coherent strategy ensuring that task dependencies are respected while maximizing operational efficiency.
Figure 2: An overview of LaMMA-P's modular architecture with its LM-integrated components orchestrating task execution.

Problem Formulation and Task Planning

Long-horizon tasks are formalized within a cooperative Multi-Agent Planning (MAP) framework, where a collection of agents ( $\mathcal{AG}$ ) collaboratively formulates a task execution plan. The planning challenge involves translating human instructions into a sequence of actions that collectively lead from an initial state ( $\mathcal{I}$ ) to a desired goal state ( $\mathcal{G}$ ).

To effectively facilitate this process, LaMMA-P uses probabilistic models and heuristic searches to decompose tasks into subtasks. The Precondition Identifier simplifies task preconditions, thus aiding LMs in constructing efficient task sequences. This step is crucial in leveraging LM strengths in reasoning and understanding language semantics.

Experimental Evaluation

LaMMA-P is evaluated using the MAT-THOR benchmark, built upon the AI2-THOR simulator. This dataset consists of tasks varying in complexity and clear task allocation strategies.

Figure 3: Keyframe execution of tasks such as switching off lights and organizing items within a virtual environment, demonstrating the effective coordination facilitated by LaMMA-P.

Results

LaMMA-P surpasses baselines significantly, achieving a 105% greater success rate and 36% better efficiency compared to the SMART-LLM baseline. Its modular architecture allows for dynamic task translation, effortless allocation, and robust plan execution, which are evident across diverse task categories in the MAT-THOR dataset. It handles complex dependencies and coherent task sequencing effectively, even when tasks are communicated through vague instructions.

Conclusion

LaMMA-P represents a significant step towards effective multi-agent task planning in heterogeneous environments over long horizons. By merging LMs with PDDL planners, it dramatically improves task success rates, efficiency, and system generalizability. Future enhancements could involve refining its adaptability to dynamic environments by incorporating perception modules and enhancing re-planning capabilities, thereby widening its applicability in real-world settings.