Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning (2209.08483v3)

Published 18 Sep 2022 in cs.LG and cs.AI

Abstract: This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available at https://github.com/tencent-ailab/hok_env . The documentation is available at https://aiarena.tencent.com/hok/doc/ .

Citations (24)

View on Semantic Scholar

Summary

The paper introduces Honor of Kings Arena, a new RL testbed derived from a MOBA game that simulates complex multi-agent, imperfect information settings.
It provides a Python-based API to configure observation, action, and reward spaces, facilitating efficient benchmarking of reinforcement learning algorithms.
Experimental results with PPO and Ape-X DQN reveal significant challenges in policy generalization, underlining the need for advanced multi-task and transfer learning strategies.

Honor of Kings Arena: An Environment for Generalization in Competitive Reinforcement Learning

The paper introduces "Honor of Kings Arena," a robust reinforcement learning environment derived from the popular multiplayer online battle arena (MOBA) game, Honor of Kings. This environment is designed to evaluate and enhance generalization capabilities in competitive reinforcement learning (RL). Unlike traditional board games or simpler environments, Honor of Kings Arena presents a multi-agent, imperfect information scenario requiring sophisticated strategies over extended time horizons with diverse agents and objectives.

Key Features and Contributions

The primary contributions of this work are summarized as follows:

Environment Complexity: Honor of Kings Arena simulates a challenging MOBA 1v1 setting, encompassing multiple agent roles with variable skills and strategies. It captures the dynamics of over 20 possible heroes, promoting diversity in challenges arising from differing opponent interactions and hero-specific control tasks.
APIs and Interface: The authors provide a Python-based interface for seamless interaction with the game engine, allowing researchers to configure observation spaces, action spaces, and reward structures. This facilitates efficient reinforcement learning tasks and benchmarking.
Generalization Challenges: The paper underscores two main challenges:
- Across Opponents: The ability to generalize when facing different opponent heroes in the same target role.
- Across Targets: The adaptability required when controlling different hero targets against a static opponent role.
Benchmarking and Performance: Preliminary results with popular RL algorithms like PPO and Ape-X DQN are provided. These benchmarks reveal that existing methods struggle with the inherent complexities, emphasizing the need for innovative RL models that can generalize across diverse scenarios.
Research Implications: Honor of Kings Arena extends beyond simple RL environments, offering an open-source testbed for new algorithms exploring generalization in complex competitive settings. It pinpoints the inadequacies of current models in coping with diverse game states and emphasizes the necessity for future research on policy transferability and robust RL training methods.

Experimental Observations

The researchers conducted various experimental setups to ascertain the generalizability of training models:

Training RL models to outperform behavior-tree (BT) baselines showed promising results, with PPO and DQN models exhibiting significant, albeit inconsistent, dominance.
Direct transfer of policies across different tasks (e.g., same hero with different opponents, or different heroes against the same opponent) highlighted weaknesses in policy generalization.
Multi-task learning and model distillation were proposed as potential remedies, demonstrating improved performance across multiple tasks.

Future Directions

The paper sets a foundational framework for addressing generalization issues in multi-agent RL environments. It opens avenues for designing efficient algorithms capable of handling diverse action spaces and dynamically changing environments. As the Arena supports various hero roles and game mechanics, further research can build upon these findings to explore multi-agent coordination, strategic planning, and learning transferable skills in RL.

In conclusion, "Honor of Kings Arena" significantly broadens the landscape for competitive RL environments, presenting both formidable challenges and invaluable opportunities for advancing artificial intelligence in gaming and beyond.