The StarCraft Multi-Agent Challenge (SMAC): A Benchmark for Cooperative Multi-Agent Reinforcement Learning
The paper introduces the StarCraft Multi-Agent Challenge (SMAC) as a benchmark environment tailored specifically for cooperative Multi-Agent Reinforcement Learning (MARL). SMAC is built upon the popular real-time strategy game StarCraft II and emphasizes micromanagement challenges to assess MARL algorithms. The benchmark features diverse scenarios where each unit is an independent agent that acts based on partial local observations.
Introduction
The authors provide a context for the importance of multi-agent scenarios in real-world applications, such as self-driving cars, autonomous drones, and distributed sensor networks. Individual agents must operate under partial observability and decentralization constraints, making coordination inherently challenging. Yet, the field lacks a standardized benchmark similar to the Arcade Learning Environment (ALE) for single-agent RL or MuJoCo for continuous control. SMAC fills this void by offering a structured, challenging environment to systematically measure progress in MARL.
SMAC Environment
SMAC leverages the StarCraft II Learning Environment (SC2LE) to provide a suite of micromanagement challenges suitable for evaluating decentralized control algorithms. Agents are required to perform cooperative tasks under partial observability, relying solely on local information restricted to their field of view. This setup models conditions similar to many practical applications where full state observability is unfeasible.
Scenarios
The paper details a variety of combat scenarios, each designed to necessitate different micromanagement techniques. Table \ref{tab:scenario} in the original document lists the configurations, specifying units involved and the parameters varying across different challenges. These scenarios include:
- Symmetric battles: Both allied and enemy units have the same composition.
- Asymmetric battles: The enemy army outnumbers or outclasses the allies.
- Micro-trick scenarios: Require advanced micromanagement techniques like kiting or using terrain features.
Observations and Actions
Agents receive a feature vector within a limited field of view, which includes information such as relative positions, health, and unit types of nearby allies and enemies. This setting ensures partial observability to challenge the learning algorithms. The action space encompasses movement in four cardinal directions, attacking specific enemy units within a shooting range, and, for Medivac units, healing actions.
Reward Structure
The primary reward metric is the "win rate" — the ratio of games won to games played. The default rewards are shaped, accruing points based on hit-point damage dealt and enemy units killed, with additional bonuses for winning scenarios. Shaped rewards aid in better training convergence.
PyMARL Framework
To facilitate research, the authors contribute PyMARL, an open-source software framework designed for deep MARL. PyMARL implements several state-of-the-art MARL algorithms such as QMIX, QTRAN, and COMA, alongside baseline methods like IQL and VDN. It offers modularity, ease of code readability, and integration with PyTorch, making it an essential tool for the MARL research community.
Results and Analysis
The experimental results highlight the effectiveness of the SMAC benchmark by evaluating state-of-the-art algorithms across the suite of scenarios. The evaluation procedure involves periodically testing the agents' performance in 32 episodes and reporting the median win rate. Key results include:
- Overall Performance: QMIX significantly outperforms other algorithms across most scenarios, demonstrating superior sample efficiency.
- Scenario Categorization:
- Easy Scenarios: QMIX performs with near-optimal win rates, while IQL and COMA struggle, showcasing the importance of a sophisticated, centralized value function.
- Hard Scenarios: Diverse challenges require advanced coordination and precise control, where current methods display mixed success.
- Super-Hard Scenarios: Present significant challenges, such as exploration bottlenecks and partial observability, with limited success across all methods.
The median test win percentage plots and specific performance insights suggest areas where current algorithms struggle, stressing the need for future advancements in exploration and coordination mechanisms.
Conclusion and Future Work
The paper concludes by emphasizing SMAC's potential as a long-term benchmark for MARL. Future work intends to introduce more challenging scenarios involving diverse unit types and complex coordination tasks. The authors envision SMAC as a crucial tool to drive systematic and significant advances in MARL research, particularly in multi-agent exploration and coordination.
In providing a realistic, complex, yet structured environment for MARL, SMAC paves the way for more meticulous and comprehensive evaluation of decentralized control algorithms. The open-sourced PyMARL framework further enables reproducibility and extensibility, fostering a collaborative research ecosystem.