Towards Playing Full MOBA Games with Deep Reinforcement Learning (2011.12692v4)

Published 25 Nov 2020 in cs.AI and cs.LG

Abstract: MOBA games, e.g., Honor of Kings, League of Legends, and Dota 2, pose grand challenges to AI systems such as multi-agent, enormous state-action space, complex action control, etc. Developing AI for playing MOBA games has raised much attention accordingly. However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i.e., lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes. As a result, full MOBA games without restrictions are far from being mastered by any existing AI system. In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning. Specifically, we develop a combination of novel and existing learning techniques, including curriculum self-play learning, policy distillation, off-policy adaption, multi-head value estimation, and Monte-Carlo tree-search, in training and playing a large pool of heroes, meanwhile addressing the scalability issue skillfully. Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature.

Authors (18)

Deheng Ye (50 papers)
Guibin Chen (14 papers)
Wen Zhang (170 papers)
Sheng Chen (133 papers)
Bo Yuan (151 papers)
Bo Liu (484 papers)
Jia Chen (85 papers)
Zhao Liu (97 papers)
Fuhao Qiu (3 papers)
Hongsheng Yu (3 papers)
Yinyuting Yin (2 papers)
Bei Shi (10 papers)
Liang Wang (512 papers)
Tengfei Shi (6 papers)
Qiang Fu (159 papers)
Wei Yang (349 papers)
Lanxiao Huang (16 papers)
Wei Liu (1135 papers)

Citations (171)

View on Semantic Scholar

Summary

The paper demonstrates a deep reinforcement learning framework that masters full MOBA games, achieving win rates of 95.2% in professional matches and 97.7% in public games.
The paper introduces key innovations including Curriculum Self-Play Learning, Off-Policy Adaption, Multi-Head Value Estimation, and neural MCTS drafting to effectively tackle the complexities of multi-agent interactions.
The work provides valuable insights for scaling AI to real-time strategic challenges, with implications extending to robotics and other collaborative systems.

Understanding the AI Advancements in MOBA Game Playing

The paper "Towards Playing Full MOBA Games with Deep Reinforcement Learning" presents a substantial advancement in the application of AI systems to multiplayer online battle arena (MOBA) games, such as Honor of Kings, League of Legends, and Dota 2. These games involve complex multi-agent interactions, demanding strategic planning and execution from AI players, which has historically posed challenges due to the exponential complexity of the state-action spaces and numerous agent combinations.

Methodological Innovations

The authors introduce a MOBA AI learning paradigm based on deep reinforcement learning (DRL) that methodologically enables AI to master playing full MOBA games without the hero pool restrictions seen in previous works like OpenAI Five, which operated with only 17 heroes. This learning paradigm integrates several sophisticated machine learning techniques, namely:

Curriculum Self-Play Learning (CSPL): This approach structures the learning process by using progressively complex tasks. It starts with fixed hero lineups to train smaller teacher models and gradually builds complexity by merging policies through multi-teacher policy distillation.
Off-Policy Adaption: The AI system utilizes off-policy training to handle diverse game states and policy deviations, thereby maintaining stability and performance over long training horizons.
Multi-Head Value Estimation (MHV): This method integrates a decomposition of game rewards into multiple heads to provide more nuanced value estimates, aiding AI agents in learning different aspects of game strategy more effectively.
Monte-Carlo Tree Search (MCTS): To efficiently manage hero drafting with large pools, a neural network-based MCTS drafting agent is deployed, providing practical solutions to computational challenges posed by expansive hero pools.

Numerical Performance and Testing Scope

This AI system was evaluated through both professional matches and large-scale testing with the public:

Professional Matches: Over 42 matches against esports professionals showed an AI victory rate of 95.2%, demonstrating superhuman capabilities.
Public Matches: The AI achieved a 97.7% win rate over 642,047 games against top-ranking players, underscoring its effectiveness across diverse high-level strategies.

These tests mark a significant increase in scale compared to earlier AI evaluations on games like StarCraft and Dota, confirming the robustness and adaptability of the approach.

Implications and Future Directions

This research contributes substantial theoretical and practical insights into AI’s feasibility in solving complex strategic problems with real-time requirements. The innovations in DRL scalability are likely to transcend MOBA games, offering utility in robotics and other real-time collaborative systems. The work stimulates further research into efficient methods to scale AI learning processes, particularly in high-dimensional environments, and paves the way for a more complete understanding of multi-agent cooperation and competition dynamics.

The authors express intentions to extend the AI capabilities to all 101 heroes in Honor of Kings, further refining their approach for complete mastery of the game. Additionally, their methodology presents a foundation for developing subtasks that could benefit the broader AI community.

Overall, this paper provides a comprehensive view of the possibilities and challenges that come with mastering MOBA games using artificial intelligence, contributing a meaningful step forward in strategic game AI research.

PDF Markdown