Mastering Complex Control in MOBA Games with Deep Reinforcement Learning (1912.09729v3)

Published 20 Dec 2019 in cs.AI and cs.LG

Abstract: We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, our AI agent, called Tencent Solo, can defeat top professional human players in full 1v1 games.

PDF Abstract

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

The paper presents a methodological advancement in using deep reinforcement learning (DRL) to manage complex control in Multi-player Online Battle Arena (MOBA) 1v1 games. This paper primarily examines the challenges posed by the vast state and action spaces in MOBA games compared to more traditional 1v1 games like Go, Chess, and Atari series. Recognizing the increased complexity inherent in MOBA environments, the researchers have developed a comprehensive DRL framework catering to both system architecture and algorithmic innovation, enabling artificial agents to achieve a level of play that surpasses top professional human players in a popular MOBA game, "Honor of Kings."

System Architecture and Methodology

The researchers have designed a scalable, decoupled system architecture that effectively utilizes large-scale computing resources. The system decouples experiences generation from parameter learning, facilitating efficient off-policy DRL. The architectural components include an RL Learner, AI Server, Dispatch Module, and Memory Pool, all contributing to a high-throughput training environment that can scale to large numbers of CPU cores and GPUs.

For algorithmic design, the paper proposes a novel actor-critic approach tailored to the MOBA 1v1 setting. The network encodes multi-modal state inputs and employs a variety of strategies to contend with the game's complexities. Key innovations include:

Control Dependency Decoupling: Allows independent handling of different action labels, effectively diversifying policy interactions and simplifying network architecture.
Action Masking: A game-knowledge-based pruning mechanism that guides exploration by eliminating infeasible actions according to prior expert knowledge, thereby speeding up training.
Target Attention Mechanism: Facilitates focus on relevant game units, critical for decision-making in the dynamic MOBA environment.
Dual-Clip PPO Algorithm: An adaptation of Proximal Policy Optimization to ensure training stability and convergence, especially in off-policy settings with varied data sources.

Results and Implications

Empirical validation detailed within the paper underscores the effectiveness of the proposed DRL framework. In experimental settings, dubbed Tencent Solo, the AI agent convincingly defeated top professional players in the MOBA game "Honor of Kings." The evaluations accounted for various hero types—each with unique skills and mechanics—demonstrating the agent's robust and versatile control capabilities across different character archetypes.

Moreover, the paper reports the AI's impressive performance in both head-to-head matches against top amateurs and professional players and in public exhibition settings. The AI consistently achieved high win rates against top-tier human competition, suggesting a culmination of high-level stratagem, adaptability, and micro-management skills previously attributed to peak human play.

Future Directions

While the paper consolidates a functional and proficient AI for 1v1 MOBA control tasks, it beckons further exploration in multi-agent extensions and different MOBA formats. The open-source framework envisioned by the authors could serve as an invaluable resource to the research community for advancing AI capabilities in similar complex environments.

Overall, this work presents significant contributions toward the development of autonomous agents in competitive, complex gaming environments, extending the applicability and potential of DRL methodologies in realms of intricate decision-making landscapes.

PDF Markdown Bookmark Chat (Pro)

Authors (18)

Deheng Ye (50 papers)
Zhao Liu (97 papers)
Mingfei Sun (30 papers)
Bei Shi (10 papers)
Peilin Zhao (127 papers)
Hao Wu (623 papers)
Hongsheng Yu (3 papers)
Shaojie Yang (7 papers)
Xipeng Wu (2 papers)
Qingwei Guo (3 papers)
Qiaobo Chen (3 papers)
Yinyuting Yin (2 papers)
Hao Zhang (947 papers)
Tengfei Shi (6 papers)
Liang Wang (512 papers)
Qiang Fu (159 papers)
Wei Yang (349 papers)
Lanxiao Huang (16 papers)

Citations (286)

View on Semantic Scholar

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning (1912.09729v3)