TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning (2011.12895v2)

Published 25 Nov 2020 in cs.LG, cs.AI, and cs.MA

Abstract: Competitive Self-Play (CSP) based Multi-Agent Reinforcement Learning (MARL) has shown phenomenal breakthroughs recently. Strong AIs are achieved for several benchmarks, including Dota 2, Glory of Kings, Quake III, StarCraft II, to name a few. Despite the success, the MARL training is extremely data thirsty, requiring typically billions of (if not trillions of) frames be seen from the environment during training in order for learning a high performance agent. This poses non-trivial difficulties for researchers or engineers and prevents the application of MARL to a broader range of real-world problems. To address this issue, in this manuscript we describe a framework, referred to as TLeague, that aims at large-scale training and implements several main-stream CSP-MARL algorithms. The training can be deployed in either a single machine or a cluster of hybrid machines (CPUs and GPUs), where the standard Kubernetes is supported in a cloud native manner. TLeague achieves a high throughput and a reasonable scale-up when performing distributed training. Thanks to the modular design, it is also easy to extend for solving other multi-agent problems or implementing and verifying MARL algorithms. We present experiments over StarCraft II, ViZDoom and Pommerman to show the efficiency and effectiveness of TLeague. The code is open-sourced and available at https://github.com/tencent-ailab/tleague_projpage

PDF Abstract

TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning

The paper "TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning" presents an infrastructure designed to efficiently handle the demanding computational requirements of Competitive Self-Play (CSP) in Multi-Agent Reinforcement Learning (MARL). Developed by researchers at Tencent Robotics X and Tsinghua University, TLeague addresses the significant data demands of MARL by enabling large-scale, distributed training.

Key Contributions

The paper outlines several critical contributions to the field of MARL:

Scalability and Modular Design: TLeague supports both single and cluster environments, utilizing Kubernetes for cloud-native deployment. This flexibility is critical for scaling up the training processes, which often require extensive computational resources, including tens of thousands of CPU cores and hundreds of GPUs.
High Throughput Distributed Training: With its ability to maximize the utilization of hybrid machines (comprising CPUs and GPUs), TLeague achieves high throughput, making large-scale MARL experiments feasible. The use of Horovod for synchronous gradient updates further optimizes resource utilization.
Extensibility and Versatility: TLeague's modular design facilitates easy extension for new multi-agent problems and supports mainstream CSP-MARL algorithms. This adaptability makes it a robust choice for researchers looking to deploy MARL in diverse environments and applications.

Numerical Results and Experiments

The paper discusses experiments carried out using TLeague on environments like StarCraft II, ViZDoom, and Pommerman, showcasing its efficiency and effectiveness. For instance:

StarCraft II: Demonstrated in the context of the zerg-vs-zerg full game, highlighting the framework's ability to handle complex, strategic video games.
ViZDoom: The framework successfully trained agents that outperformed both built-in bots and existing champions in the ViZDoom Competition track, establishing TLeague's competitive advantage.
Pommerman: In the NeurIPS 2018 competition environment for 2vs2, TLeague-trained agents achieved superior performance metrics, further proving the framework's capability in handling cooperative-competitive environments.

Theoretical Implications

The implementation of Fictitious Self-Play (FSP) within TLeague is aligned with Nash Equilibrium finding in game theory, thereby providing a theoretically sound basis for the training process. This approach addresses commonly encountered issues in MARL, such as non-stationary dynamics and policy forgetting, by leveraging opponent sampling strategies.

Practical Implications and Future Directions

From a practical perspective, TLeague's architecture is well-suited for industries and applications where MARL solutions are applicable but previously hindered by computational constraints. Future work could explore the application of this framework to even broader scales and more complex problems, such as strategic military simulations or real-world robotics challenges.

In conclusion, TLeague represents a robust and scalable solution for CSP-MARL, addressing both theoretical and practical challenges in today's AI research landscape. Its open-source nature and design flexibility make it a valuable tool for advancing the frontiers of multi-agent interactions and learning algorithms.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Peng Sun (210 papers)
Jiechao Xiong (21 papers)
Lei Han (91 papers)
Xinghai Sun (4 papers)
Shuxing Li (30 papers)
Jiawei Xu (64 papers)
Meng Fang (100 papers)
Zhengyou Zhang (21 papers)

Citations (19)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - tencent-ailab/tleague_projpage (144 stars)