Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy (2503.10049v1)

Published 13 Mar 2025 in cs.CV

Abstract: Multi-agent systems (MAS) have shown great potential in executing complex tasks, but coordination and safety remain significant challenges. Multi-Agent Reinforcement Learning (MARL) offers a promising framework for agent collaboration, but it faces difficulties in handling complex tasks and designing reward functions. The introduction of LLMs has brought stronger reasoning and cognitive abilities to MAS, but existing LLM-based systems struggle to respond quickly and accurately in dynamic environments. To address these challenges, we propose LLM-based Graph Collaboration MARL (LGC-MARL), a framework that efficiently combines LLMs and MARL. This framework decomposes complex tasks into executable subtasks and achieves efficient collaboration among multiple agents through graph-based coordination. Specifically, LGC-MARL consists of two main components: an LLM planner and a graph-based collaboration meta policy. The LLM planner transforms complex task instructions into a series of executable subtasks, evaluates the rationality of these subtasks using a critic model, and generates an action dependency graph. The graph-based collaboration meta policy facilitates communication and collaboration among agents based on the action dependency graph, and adapts to new task environments through meta-learning. Experimental results on the AI2-THOR simulation platform demonstrate the superior performance and scalability of LGC-MARL in completing various complex tasks.

Authors (4)

Ziqi Jia (3 papers)
Junjie Li (98 papers)
Xiaoyang Qu (41 papers)
Jianzong Wang (144 papers)

Summary

LGC-MARL: Enhancing MAS via LLM Planning and Graph-based Policies

The paper "Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy" (Jia et al., 13 Mar 2025 ) introduces LGC-MARL, a novel framework designed to enhance multi-agent system (MAS) performance through the integration of LLMs and multi-agent reinforcement learning (MARL). The core motivation stems from the challenges inherent in MAS, particularly in complex task execution, agent coordination, and ensuring safety, compounded by the difficulties in reward function design within MARL. While LLMs offer enhanced reasoning and cognitive capabilities, their application in dynamic environments has been limited by response time and accuracy issues. LGC-MARL addresses these limitations by decomposing complex tasks into subtasks and enabling efficient collaboration through graph-based coordination.

LGC-MARL Architecture and Functionality

LGC-MARL consists of two primary components: an LLM planner and a graph-based collaboration meta-policy.

LLM Planner

The LLM planner is responsible for transforming complex task instructions into a structured series of executable subtasks. This decomposition leverages the LLM's reasoning capabilities to create a sequential plan that agents can follow. Crucially, the planner incorporates a critic model to evaluate the rationality and feasibility of the generated subtasks, ensuring the plan is coherent and executable within the environment's constraints. Furthermore, the LLM planner generates an action dependency graph, which explicitly defines the dependencies between different subtasks and, consequently, the required interactions between agents. This graph serves as a blueprint for coordination, dictating which agents need to communicate and collaborate to achieve the overall task objective.

Graph-based Collaboration Meta-Policy

The graph-based collaboration meta-policy facilitates communication and collaboration among agents, leveraging the action dependency graph generated by the LLM planner. This policy is designed to be adaptive, using a meta-learning approach to generalize to new task environments. The graph structure allows the policy to efficiently manage inter-agent communication, ensuring that relevant information is shared between agents that are dependent on each other's actions. By using meta-learning, the policy can quickly adapt to new tasks with minimal retraining, improving the scalability and applicability of the system.

Experimental Evaluation on AI2-THOR

The efficacy of LGC-MARL was evaluated on the AI2-THOR simulation platform. The experimental results reportedly demonstrate the framework's superior performance and scalability in completing complex tasks. The AI2-THOR environment provides a diverse set of simulated household tasks, requiring agents to navigate, manipulate objects, and coordinate their actions to achieve specific goals. The results indicate that LGC-MARL outperforms existing methods in terms of task completion rate, efficiency, and robustness to environmental variations.

Implications and Future Directions

LGC-MARL represents a significant advancement in the field of MAS, effectively integrating the strengths of LLMs and MARL to address the challenges of complex task execution and agent coordination. The use of an LLM planner to decompose tasks and generate action dependency graphs provides a structured approach to managing complex interactions, while the graph-based collaboration meta-policy enables efficient communication and adaptation. The experimental results on AI2-THOR support the claim that LGC-MARL achieves superior performance and scalability. Further research could explore the application of LGC-MARL to other complex domains, investigate the use of different LLM architectures, and develop more sophisticated meta-learning techniques to improve the adaptability and robustness of the system.

In conclusion, LGC-MARL offers a promising framework for enhancing multi-agent systems by leveraging LLMs for planning and graph-based policies for efficient collaboration, demonstrating improved performance and scalability in complex simulated tasks.

PDF Markdown