Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks (2006.07869v4)

Published 14 Jun 2020 in cs.LG, cs.AI, cs.MA, and stat.ML

Abstract: Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we provide a systematic evaluation and comparison of three different classes of MARL algorithms (independent learning, centralised multi-agent policy gradient, value decomposition) in a diverse range of cooperative multi-agent learning tasks. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks, and we provide insights regarding the effectiveness of different learning approaches. We open-source EPyMARL, which extends the PyMARL codebase to include additional algorithms and allow for flexible configuration of algorithm implementation details such as parameter sharing. Finally, we open-source two environments for multi-agent research which focus on coordination under sparse rewards.

PDF Abstract

An Evaluation of MARL Algorithms in Cooperative Tasks

The paper "Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks" provides a comprehensive assessment of multi-agent reinforcement learning (MARL) methods. Given the diversity and complexity of recent MARL algorithms integrating deep learning, this paper identifies a significant gap in standardized benchmarks, metrics, and evaluation protocols that hampers effective comparative analysis of these algorithms. The authors propose a systematic comparative paper of three categories of MARL algorithms: independent learning (IL), centralized training decentralized execution (CTDE), and value decomposition methods, examining their performance across various cooperative task settings.

Overview of MARL Algorithms

The authors evaluate nine representative MARL algorithms, including three IL methods: Independent Q-Learning (IQL), Independent A2C (IA2C), and Independent PPO (IPPO); four CTDE methods: MADDPG, COMA, MAA2C, and MAPPO; and two value decomposition methods: VDN and QMIX. The assessment is conducted across 25 cooperative tasks spanning environments like matrix games, multi-agent particle environment (MPE), StarCraft Multi-Agent Challenge (SMAC), Level-Based Foraging (LBF), and the Multi-Robot Warehouse (RWARE). Each environment presents unique challenges such as partial observability, sparse rewards, and the complexity of coordination among agents.

Empirical Findings

The findings suggest that despite the limitations often associated with IL in MARL settings, IL algorithms can perform effectively in fully observable and less complex coordination environments, like many LBF tasks and simpler SMAC scenarios. IL's limitations, however, become apparent under partial observability and when agents need to coordinate extensively, as seen in RWARE and complex SMAC tasks. Here, CTDE methods demonstrate their strength by leveraging centralized critics to approximate joint value functions and enhance coordination.

MADDPG and COMA, representative of centralized policy gradient methods, generally underperform, particularly in RWARE tasks with sparse rewards. These algorithms appear sensitive to the complexity of effectively training centralized critics across diverse environments. In contrast, MAA2C and MAPPO achieve competitive results, with MAPPO often outperforming due to enhanced sample efficiency associated with its on-policy update mechanisms.

Value decomposition methods like VDN and QMIX show strong performance across most environments except RWARE. The assumption of linear value decomposition in VDN sometimes limits its applicability, whereas QMIX benefits from more expressive value decomposition, notable in complex tasks requiring nuanced coordination under non-linear dynamics.

Implications and Future Work

The establishment of EPyMARL is a significant contribution, offering a consistent codebase that facilitates standardized evaluations by incorporating various MARL algorithms under common implementation practices. This supports an equitable comparison framework and presents a foundation for evaluating new MARL approaches. Open-source environments, LBF and RWARE, engineered to address sparse reward-dependent tasks, further expand the versatility of MARL evaluation setups available to the community.

While the research sheds light on algorithm strengths and limitations, it underscores several avenues for future work, particularly in competitive MARL domains, exploration optimization under sparse rewards, and advanced coordination under partial observability. As MARL continues to progress, studies refining intrinsic motivation and advanced communication strategies among agents promise to enhance the efficacy and applicability of MARL solutions in practical, real-world scenarios.

In conclusion, this paper stands as a rigorous guide on benchmarking MARL algorithms, vital for framing baseline comparisons and motivating future algorithmic innovations in multi-agent artificial intelligence.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Georgios Papoudakis (14 papers)
Filippos Christianos (19 papers)
Lukas Schäfer (16 papers)
Stefano V. Albrecht (73 papers)

Citations (189)

View on Semantic Scholar

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks (2006.07869v4)

An Evaluation of MARL Algorithms in Cooperative Tasks

Overview of MARL Algorithms

Empirical Findings

Implications and Future Work

Related Papers

GitHub

YouTube