On Reinforcement Learning for Full-length Game of StarCraft (1809.09095v2)

Published 23 Sep 2018 in cs.LG, cs.AI, and stat.ML

Abstract: StarCraft II poses a grand challenge for reinforcement learning. The main difficulties of it include huge state and action space and a long-time horizon. In this paper, we investigate a hierarchical reinforcement learning approach for StarCraft II. The hierarchy involves two levels of abstraction. One is the macro-action automatically extracted from expert's trajectories, which reduces the action space in an order of magnitude yet remains effective. The other is a two-layer hierarchical architecture which is modular and easy to scale, enabling a curriculum transferring from simpler tasks to more complex tasks. The reinforcement training algorithm for this architecture is also investigated. On a 64x64 map and using restrictive units, we achieve a winning rate of more than 99\% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93\% winning rate of Protoss against the most difficult non-cheating built-in AI (level-7) of Terran, training within two days using a single machine with only 48 CPU cores and 8 K40 GPUs. It also shows strong generalization performance, when tested against never seen opponents including cheating levels built-in AI and all levels of Zerg and Protoss built-in AI. We hope this study could shed some light on the future research of large-scale reinforcement learning.

View on arXiv

Authors (6)

Zhen-Jia Pang (4 papers)
Ruo-Ze Liu (7 papers)
Zhou-Yu Meng (2 papers)
Yi Zhang (994 papers)
Yang Yu (385 papers)
Tong Lu (85 papers)

Citations (84)

View on Semantic Scholar

Summary

Overview of Reinforcement Learning for StarCraft II

The paper "On Reinforcement Learning for Full-length Game of StarCraft" extends the domain of reinforcement learning (RL) techniques to tackle the complex, multi-agent, and real-time strategy (RTS) game of StarCraft II. Existing challenges in this domain include a massive state and action space, imperfect information, and long episodes. The authors propose a suite of methods to address these issues, centered around hierarchical reinforcement learning.

Key Contributions

The authors introduce a two-fold hierarchical reinforcement learning approach, which implements a decomposition of the decision-making process to simplify the large-scale problem inherent in StarCraft II:

Macro-actions and Reduced Action Space: The authors propose macro-actions derived from expert gameplay sequences. These macro-actions substantially reduce the action space while retaining their effectiveness in decision-making, facilitating a more efficient learning process.
Hierarchical Architecture: The architecture involves a bi-layered scheme where a high-level controller policy selects among sub-policies at set intervals, and each sub-policy operates at more granular levels. This setup allows for modularity and improved state-action space management.
Curriculum Transfer Learning: A curriculum learning framework is employed where the RL agent is incrementally trained against progressively challenging AI levels. This strategy enhances the learning effectiveness in complex environments.

Experimental Results

Experimentation demonstrates significant advances in agent performance. When trained on a 64x64 map, the hierarchical RL strategy with curriculum learning achieved over 93% winning rate against the level-7 StarCraft II built-in AI, classed as the hardest non-cheating difficulty level. The hierarchical model significantly outperformed single-policy architectures in high-level scenarios, underscoring the benefits of the proposed modular approach.

Implications and Future Directions

The proposed hierarchical approach not only establishes a strong benchmark for reinforcement learning on StarCraft II but also suggests new directions for large-scale RL problems with analogous characteristics. The modular nature of the architecture accommodates expansion to new sub-tasks or domains by adapting or replacing specific sub-modules.

The paper opens several avenues for further research: optimizing macro-action selection, refining sub-policy learning in multi-agent settings, and enhancing curriculum learning techniques for adaptability across broader and more diverse RTS environments. These directions can collectively push the boundaries of what RL agents can accomplish in complex, strategic environments, maintaining a delicate balance between scalability and specificity.

Conclusion

This work sheds light on the practical application of reinforcement learning in intricate RTS games, offering novel methodologies to manage large-scale decision processes. The empirical success with StarCraft II demonstrates the potential of these frameworks for use in other domains requiring strategic, adaptive AI applications.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos