The StarCraft Multi-Agent Challenge (1902.04043v5)

Published 11 Feb 2019 in cs.LG, cs.MA, and stat.ML

Abstract: In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0.

PDF Abstract

The StarCraft Multi-Agent Challenge (SMAC): A Benchmark for Cooperative Multi-Agent Reinforcement Learning

The paper introduces the StarCraft Multi-Agent Challenge (SMAC) as a benchmark environment tailored specifically for cooperative Multi-Agent Reinforcement Learning (MARL). SMAC is built upon the popular real-time strategy game StarCraft II and emphasizes micromanagement challenges to assess MARL algorithms. The benchmark features diverse scenarios where each unit is an independent agent that acts based on partial local observations.

Introduction

The authors provide a context for the importance of multi-agent scenarios in real-world applications, such as self-driving cars, autonomous drones, and distributed sensor networks. Individual agents must operate under partial observability and decentralization constraints, making coordination inherently challenging. Yet, the field lacks a standardized benchmark similar to the Arcade Learning Environment (ALE) for single-agent RL or MuJoCo for continuous control. SMAC fills this void by offering a structured, challenging environment to systematically measure progress in MARL.

SMAC Environment

SMAC leverages the StarCraft II Learning Environment (SC2LE) to provide a suite of micromanagement challenges suitable for evaluating decentralized control algorithms. Agents are required to perform cooperative tasks under partial observability, relying solely on local information restricted to their field of view. This setup models conditions similar to many practical applications where full state observability is unfeasible.

Scenarios

The paper details a variety of combat scenarios, each designed to necessitate different micromanagement techniques. Table \ref{tab:scenario} in the original document lists the configurations, specifying units involved and the parameters varying across different challenges. These scenarios include:

Symmetric battles: Both allied and enemy units have the same composition.
Asymmetric battles: The enemy army outnumbers or outclasses the allies.
Micro-trick scenarios: Require advanced micromanagement techniques like kiting or using terrain features.

Observations and Actions

Agents receive a feature vector within a limited field of view, which includes information such as relative positions, health, and unit types of nearby allies and enemies. This setting ensures partial observability to challenge the learning algorithms. The action space encompasses movement in four cardinal directions, attacking specific enemy units within a shooting range, and, for Medivac units, healing actions.

Reward Structure

The primary reward metric is the "win rate" — the ratio of games won to games played. The default rewards are shaped, accruing points based on hit-point damage dealt and enemy units killed, with additional bonuses for winning scenarios. Shaped rewards aid in better training convergence.

PyMARL Framework

To facilitate research, the authors contribute PyMARL, an open-source software framework designed for deep MARL. PyMARL implements several state-of-the-art MARL algorithms such as QMIX, QTRAN, and COMA, alongside baseline methods like IQL and VDN. It offers modularity, ease of code readability, and integration with PyTorch, making it an essential tool for the MARL research community.

Results and Analysis

The experimental results highlight the effectiveness of the SMAC benchmark by evaluating state-of-the-art algorithms across the suite of scenarios. The evaluation procedure involves periodically testing the agents' performance in 32 episodes and reporting the median win rate. Key results include:

Overall Performance: QMIX significantly outperforms other algorithms across most scenarios, demonstrating superior sample efficiency.
Scenario Categorization:
- Easy Scenarios: QMIX performs with near-optimal win rates, while IQL and COMA struggle, showcasing the importance of a sophisticated, centralized value function.
- Hard Scenarios: Diverse challenges require advanced coordination and precise control, where current methods display mixed success.
- Super-Hard Scenarios: Present significant challenges, such as exploration bottlenecks and partial observability, with limited success across all methods.

The median test win percentage plots and specific performance insights suggest areas where current algorithms struggle, stressing the need for future advancements in exploration and coordination mechanisms.

Conclusion and Future Work

The paper concludes by emphasizing SMAC's potential as a long-term benchmark for MARL. Future work intends to introduce more challenging scenarios involving diverse unit types and complex coordination tasks. The authors envision SMAC as a crucial tool to drive systematic and significant advances in MARL research, particularly in multi-agent exploration and coordination.

In providing a realistic, complex, yet structured environment for MARL, SMAC paves the way for more meticulous and comprehensive evaluation of decentralized control algorithms. The open-sourced PyMARL framework further enables reproducibility and extensibility, fostering a collaborative research ecosystem.

PDF Markdown Bookmark Chat (Pro)

Authors (10)

Mikayel Samvelyan (22 papers)
Tabish Rashid (16 papers)
Christian Schroeder de Witt (49 papers)
Gregory Farquhar (21 papers)
Nantas Nardelli (19 papers)
Tim G. J. Rudner (38 papers)
Chia-Man Hung (5 papers)
Philip H. S. Torr (219 papers)
Jakob Foerster (100 papers)
Shimon Whiteson (122 papers)

Citations (858)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos