Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment (2502.16863v1)

Published 24 Feb 2025 in cs.MA, cs.LG, and cs.RO

Abstract: Recent work, spanning from autonomous vehicle coordination to in-space assembly, has shown the importance of learning collaborative behavior for enabling robots to achieve shared goals. A common approach for learning this cooperative behavior is to utilize the centralized-training decentralized-execution paradigm. However, this approach also introduces a new challenge: how do we evaluate the contributions of each agent's actions to the overall success or failure of the team. This credit assignment problem has remained open, and has been extensively studied in the Multi-Agent Reinforcement Learning literature. In fact, humans manually inspecting agent behavior often generate better credit evaluations than existing methods. We combine this observation with recent works which show LLMs demonstrate human-level performance at many pattern recognition tasks. Our key idea is to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution, which motivates our novel LLM-MCA method. Our approach utilizes a centralized LLM reward-critic which numerically decomposes the environment reward based on the individualized contribution of each agent in the scenario. We then update the agents' policy networks based on this feedback. We also propose an extension LLM-TACA where our LLM critic performs explicit task assignment by passing an intermediary goal directly to each agent policy in the scenario. Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints. As an artifact of our methods, we generate large trajectory datasets with each timestep annotated with per-agent reward information, as sampled from our LLM critics.

Authors (4)

Kartik Nagpal (6 papers)
Dayi Dong (4 papers)
Jean-Baptiste Bouvier (14 papers)
Negar Mehr (36 papers)

Summary

The paper presents LLM-MCA, which reframes multi-agent credit assignment as a pattern recognition problem using a centralized LLM reward-critic.
It extends the approach with LLM-TACA, where the LLM critic assigns explicit intermediary tasks to agents for enhanced collaboration.
Experiments show both methods outperform state-of-the-art techniques on benchmarks, improving performance and explainability in MARL.

Overview of LLM-MCA and LLM-TACA

The paper "Leveraging LLMs for Effective and Explainable Multi-Agent Credit Assignment" (Nagpal et al., 24 Feb 2025 ) introduces a novel approach to multi-agent credit assignment (MCA) by leveraging the capabilities of LLMs. The central thesis is that MCA can be effectively reformulated as pattern recognition problems, specifically sequence improvement and attribution. This perspective motivates the LLM-MCA method, which employs a centralized LLM reward-critic to decompose the environmental reward and provide individualized feedback to each agent. Furthermore, the paper presents an extension, LLM-TACA, where the LLM critic performs explicit task assignment, communicating intermediary goals directly to the agents.

LLM-MCA: LLM-Based Multi-Agent Credit Assignment

LLM-MCA addresses the challenge of evaluating individual agent contributions within a centralized-training decentralized-execution paradigm, a common approach in multi-agent reinforcement learning (MARL). The method hinges on the observation that human experts often outperform existing MCA techniques when manually assessing agent behavior. LLM-MCA capitalizes on the demonstrated pattern recognition abilities of LLMs to mimic and enhance this human-level evaluation.

The core of LLM-MCA is a centralized LLM reward-critic. This critic receives as input the state-action trajectories of all agents within the environment. The LLM then processes this information to numerically decompose the global environmental reward, assigning credit to each agent based on their perceived contribution. This credit assignment is achieved by framing the problem as sequence improvement and attribution. The LLM assesses how each agent's actions contribute to or detract from the overall team performance. The individual agents' policy networks are then updated based on the reward signals provided by the LLM critic.

LLM-TACA: LLM-Based Task Assignment and Credit Assignment

LLM-TACA extends the LLM-MCA framework by incorporating explicit task assignment. In LLM-TACA, the LLM critic not only decomposes the reward but also generates intermediary goals for each agent. These goals are communicated directly to the agent policies, guiding their behavior and facilitating more effective collaboration.

This task assignment process allows the LLM to provide more nuanced and targeted feedback to each agent, potentially leading to improved learning and performance. By explicitly defining sub-goals, the LLM-TACA approach may also enhance the explainability of the agents' behavior, as their actions can be directly linked to the assigned tasks.

Experimental Results

The paper reports that both LLM-MCA and LLM-TACA outperform state-of-the-art methods across a range of benchmark environments. These environments include Level-Based Foraging, Robotic Warehouse, and a novel benchmark called Spaceworld, which incorporates collision-related safety constraints. The superior performance suggests that leveraging LLMs for credit assignment and task allocation can significantly improve the effectiveness of MARL algorithms. Furthermore, the methods generate trajectory datasets annotated with per-agent reward information, as sampled from the LLM critics.

Implications and Significance

The LLM-MCA and LLM-TACA methods offer a promising direction for advancing the field of MARL. By reformulating credit assignment as a pattern recognition problem and leveraging the capabilities of LLMs, these approaches demonstrate improved performance and potentially enhanced explainability. The generation of annotated trajectory datasets is a valuable contribution, enabling further research and analysis of multi-agent behavior. These contributions have strong implications in the field, as multi-agent systems grow to encompass increasingly complex problems.

In conclusion, the paper introduces two novel methods, LLM-MCA and LLM-TACA, that leverage LLMs for effective and explainable multi-agent credit assignment. These methods outperform existing techniques on a variety of benchmarks and offer valuable insights into the potential of LLMs for advancing the field of MARL.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/negar_mehr/status/1898105939267666088