- The paper introduces Honor of Kings Arena, a new RL testbed derived from a MOBA game that simulates complex multi-agent, imperfect information settings.
- It provides a Python-based API to configure observation, action, and reward spaces, facilitating efficient benchmarking of reinforcement learning algorithms.
- Experimental results with PPO and Ape-X DQN reveal significant challenges in policy generalization, underlining the need for advanced multi-task and transfer learning strategies.
Honor of Kings Arena: An Environment for Generalization in Competitive Reinforcement Learning
The paper introduces "Honor of Kings Arena," a robust reinforcement learning environment derived from the popular multiplayer online battle arena (MOBA) game, Honor of Kings. This environment is designed to evaluate and enhance generalization capabilities in competitive reinforcement learning (RL). Unlike traditional board games or simpler environments, Honor of Kings Arena presents a multi-agent, imperfect information scenario requiring sophisticated strategies over extended time horizons with diverse agents and objectives.
Key Features and Contributions
The primary contributions of this work are summarized as follows:
- Environment Complexity: Honor of Kings Arena simulates a challenging MOBA 1v1 setting, encompassing multiple agent roles with variable skills and strategies. It captures the dynamics of over 20 possible heroes, promoting diversity in challenges arising from differing opponent interactions and hero-specific control tasks.
- APIs and Interface: The authors provide a Python-based interface for seamless interaction with the game engine, allowing researchers to configure observation spaces, action spaces, and reward structures. This facilitates efficient reinforcement learning tasks and benchmarking.
- Generalization Challenges: The paper underscores two main challenges:
- Across Opponents: The ability to generalize when facing different opponent heroes in the same target role.
- Across Targets: The adaptability required when controlling different hero targets against a static opponent role.
- Benchmarking and Performance: Preliminary results with popular RL algorithms like PPO and Ape-X DQN are provided. These benchmarks reveal that existing methods struggle with the inherent complexities, emphasizing the need for innovative RL models that can generalize across diverse scenarios.
- Research Implications: Honor of Kings Arena extends beyond simple RL environments, offering an open-source testbed for new algorithms exploring generalization in complex competitive settings. It pinpoints the inadequacies of current models in coping with diverse game states and emphasizes the necessity for future research on policy transferability and robust RL training methods.
Experimental Observations
The researchers conducted various experimental setups to ascertain the generalizability of training models:
- Training RL models to outperform behavior-tree (BT) baselines showed promising results, with PPO and DQN models exhibiting significant, albeit inconsistent, dominance.
- Direct transfer of policies across different tasks (e.g., same hero with different opponents, or different heroes against the same opponent) highlighted weaknesses in policy generalization.
- Multi-task learning and model distillation were proposed as potential remedies, demonstrating improved performance across multiple tasks.
Future Directions
The paper sets a foundational framework for addressing generalization issues in multi-agent RL environments. It opens avenues for designing efficient algorithms capable of handling diverse action spaces and dynamically changing environments. As the Arena supports various hero roles and game mechanics, further research can build upon these findings to explore multi-agent coordination, strategic planning, and learning transferable skills in RL.
In conclusion, "Honor of Kings Arena" significantly broadens the landscape for competitive RL environments, presenting both formidable challenges and invaluable opportunities for advancing artificial intelligence in gaming and beyond.