SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning (2212.07489v2)

Published 14 Dec 2022 in cs.LG and cs.MA

Abstract: The availability of challenging benchmarks has played a key role in the recent progress of machine learning. In cooperative multi-agent reinforcement learning, the StarCraft Multi-Agent Challenge (SMAC) has become a popular testbed for centralised training with decentralised execution. However, after years of sustained improvement on SMAC, algorithms now achieve near-perfect performance. In this work, we conduct new analysis demonstrating that SMAC lacks the stochasticity and partial observability to require complex closed-loop policies. In particular, we show that an open-loop policy conditioned only on the timestep can achieve non-trivial win rates for many SMAC scenarios. To address this limitation, we introduce SMACv2, a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings (from the same distribution) during evaluation. We also introduce the extended partial observability challenge (EPO), which augments SMACv2 to ensure meaningful partial observability. We show that these changes ensure the benchmark requires the use of closed-loop policies. We evaluate state-of-the-art algorithms on SMACv2 and show that it presents significant challenges not present in the original benchmark. Our analysis illustrates that SMACv2 addresses the discovered deficiencies of SMAC and can help benchmark the next generation of MARL methods. Videos of training are available at https://sites.google.com/view/smacv2.

PDF Abstract

An Overview of SMACv2: Advancements in Cooperative Multi-Agent Reinforcement Learning Benchmarks

The development of benchmarks holds significant importance in advancing machine learning methodologies, particularly in the domain of cooperative multi-agent reinforcement learning (MARL). The paper "SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning" introduces an updated benchmark, SMACv2, addressing the limitations identified in the original StarCraft Multi-Agent Challenge (SMAC). This paper outlines the enhancements made to the benchmark to better evaluate the capabilities of MARL algorithms, emphasizing the necessity for environments that demand complex, closed-loop policies in partially observable and stochastic scenarios.

Key Improvements in SMACv2

The original SMAC benchmark, while widely adopted, exhibited limitations as algorithms began achieving near-perfect performance, suggesting a lack of challenge. In particular, the benchmark lacked sufficient stochasticity and meaningful partial observability that would necessitate advanced closed-loop policies. The paper identifies these limitations through analysis, demonstrating that open-loop policies, which do not rely on environmental observations, can perform effectively in many scenarios due to deterministic initial states and transitions.

To address these shortcomings, SMACv2 introduces several critical changes:

Procedural Content Generation: Scenarios in SMACv2 are procedurally generated, leading to varied team compositions and starting positions in each episode. This change increases the need for generalization and adaptation, as agents can no longer follow static action sequences but must develop strategies adaptable to unseen settings.
Extended Partial Observability (EPO): The introduction of the EPO challenge augments SMACv2 to ensure meaningful partial observability. By stochastically masking enemy observations and altering action availability, EPO enhances the challenges in agent communication and decision-making, fostering environments closely aligned with realistic decentralization challenges.
Diverse Unit Types and Randomized Initial Positions: Unlike SMAC, which used fixed sight and attack ranges, SMACv2 employs true in-game ranges, enhancing diversity among unit interactions. Random start positions further increase the strategic complexity, requiring robust situational awareness and coordination.

Evaluations and Implications

Extensive evaluations using state-of-the-art algorithms such as MAPPO and QMIX reveal that SMACv2 presents substantial new challenges compared to its predecessor. The experiments indicate that modern algorithms struggle to maintain high win rates in the newly introduced scenarios, highlighting the increased difficulty and complexity introduced by the modifications. The results demonstrate the necessity for algorithms to better integrate observations into policy learning, addressing previously underexplored dimensions of stochasticity and partial observability in MARL environments.

The implications of SMACv2 are significant. The benchmark sets a new standard for evaluating MARL algorithms by emphasizing adaptability and coordination under increased uncertainty and partial information. This shift encourages the development of algorithms capable of effectively managing decentralized control problems, which are common in real-world applications.

Future Directions

While SMACv2 represents a significant progression, the research suggests that future MARL benchmarks may continue to integrate more sophisticated elements of randomness and observation complexity. Moreover, expanding the diversity and scale of tasks within these benchmarks could further challenge algorithms and uncover insights into their scalability and robustness.

In conclusion, SMACv2 addresses critical limitations in the evaluation of MARL methodologies, fostering advancements in algorithmic development that are more aligned with practical, real-world deployment needs. As researchers continue to engage with this improved benchmark, it is likely to drive innovation and refinement in cooperative multi-agent systems.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Benjamin Ellis (12 papers)
Jonathan Cook (9 papers)
Skander Moalla (4 papers)
Mikayel Samvelyan (22 papers)
Mingfei Sun (30 papers)
Anuj Mahajan (18 papers)
Jakob N. Foerster (27 papers)
Shimon Whiteson (122 papers)

Citations (59)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - oxwhirl/smacv2 (252 stars)