StarCraft II: A New Challenge for Reinforcement Learning (1708.04782v1)

Published 16 Aug 2017 in cs.LG and cs.AI

Abstract: This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.

Authors (25)

Oriol Vinyals (116 papers)
Timo Ewalds (7 papers)
Sergey Bartunov (12 papers)
Petko Georgiev (14 papers)
Alexander Sasha Vezhnevets (12 papers)
Michelle Yeo (10 papers)
Alireza Makhzani (21 papers)
Heinrich Küttler (17 papers)
John Agapiou (3 papers)
Julian Schrittwieser (17 papers)
John Quan (15 papers)
Stephen Gaffney (2 papers)
Stig Petersen (4 papers)
Karen Simonyan (54 papers)
Tom Schaul (42 papers)
Hado van Hasselt (57 papers)
David Silver (67 papers)
Timothy Lillicrap (60 papers)
Kevin Calderone (1 paper)
Paul Keet (1 paper)

Citations (839)

View on Semantic Scholar

Summary

StarCraft II: A New Challenge for Reinforcement Learning

The paper "StarCraft II: A New Challenge for Reinforcement Learning," authored by researchers from DeepMind and Blizzard, introduces the StarCraft II Learning Environment (SC2LE) as a comprehensive platform for reinforcement learning (RL) research. This environment poses novel and significant challenges for RL, characterized by its multifaceted and complex nature. Specifically, StarCraft II represents a multi-agent, partially observable, stochastic game with a vast action space and long-term credit assignments. These attributes make it an ideal test-bed for advancing RL algorithms and architectures.

Overview of SC2LE

SC2LE is designed to facilitate RL research by providing an interface grounded in StarCraft II, offering both the full game and a suite of mini-games that isolate specific aspects of gameplay. This design allows researchers to benchmark and develop RL algorithms incrementally. The paper details the observation, action, and reward structures unique to StarCraft II, alongside an open-source Python-based interface (PySC2) for seamless integration with RL frameworks.

Key features of SC2LE include:

Observation Space: Uses low-resolution feature layers representing various game elements.
Action Space: Mimics human interface closely, enabling RL agents to take actions through compound functions.
Reward Structure: Provides both sparse win/tie/loss signals and denser game scores from the StarCraft II engine.

Additionally, SC2LE includes a dataset of human game replays, essential for supervised learning and bootstrapping RL agents with human-like strategies.

Numerical Results and Baseline Agents

The paper presents baseline results using canonical deep RL algorithms such as Asynchronous Advantage Actor-Critic (A3C). For the mini-games, these agents learn behaviors comparable to novice players. However, on the full game, agents fail to achieve success against even the easiest built-in AI, demonstrating the substantial difficulty of the environment.

Architectural variants such as Atari-net, FullyConv, and FullyConv LSTM were evaluated:

Atari-net: Adapts an architecture used for Atari games, where spatial structure is gradually abstracted away.
FullyConv: Preserves spatial information, which is critical for actions requiring screen coordinates.
FullyConv LSTM: Incorporates memory through a convolutional LSTM, crucial for tasks requiring historical context.

Training for 600 million steps showed that none of the architectures could exploit the game score sufficiently to defeat the built-in AI. However, FullyConv and FullyConv LSTM presented promising results in mini-games focusing on specific skills, such as combat and resource collection.

Supervised Learning from Replays

The paper also investigates supervised learning from human replays to predict game outcomes and player actions. The dataset comprises 800K game replays capturing a wide spectrum of player skill levels. Utilizing architectures similar to those in RL experiments, the task of value prediction (i.e., predicting game winners) achieved an accuracy of 64% over time, improving as the games progressed.

Policy prediction accuracy benefited from architectures preserving spatial dependencies. FullyConv and arFullyConv (an autoregressive variant) were effective in predicting the spatial arguments of actions. This demonstration underscores the potential of leveraging human replays to bootstrap RL agents.

Implications and Future Directions

The introduction of SC2LE marks a significant step towards developing more advanced RL algorithms capable of handling complex, real-world-like tasks. The extensive multi-dimensional challenges posed by StarCraft II lay the groundwork for advancements in areas such as multi-agent coordination, partial observability, and long-term planning.

Future developments might focus on:

Self-Play: Enabling agents to compete against other RL-trained agents to promote continuous skill improvement.
Pixel-Based Observations: Facilitating more naturalistic conditions by utilizing raw RGB pixel inputs akin to human play.
Improved Reward Structures: Developing more sophisticated reward signals to better correlate with long-term objectives.
Large-Scale Distributed Training: Leveraging distributed systems for scalable training solutions.

In summary, SC2LE provides a rich environment to push the boundaries of RL research, offering both practical and theoretical advancements. The challenges identified in this paper suggest a promising avenue for future work to achieve human-competitive performance in complex real-time strategy games.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/analyticsaurabh/status/1752053167989166288

YouTube

Show All Videos