JaxMARL: Multi-Agent RL Environments and Algorithms in JAX (2311.10090v5)
Abstract: Benchmarks are crucial in the development of machine learning algorithms, with available environments significantly influencing reinforcement learning (RL) research. Traditionally, RL environments run on the CPU, which limits their scalability with typical academic compute. However, recent advancements in JAX have enabled the wider use of hardware acceleration, enabling massively parallel RL training pipelines and environments. While this has been successfully applied to single-agent RL, it has not yet been widely adopted for multi-agent scenarios. In this paper, we present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments and popular baseline algorithms. Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches, and up to 12500x when multiple training runs are vectorized. This enables efficient and thorough evaluations, potentially alleviating the evaluation crisis in the field. We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. The code is available at https://github.com/flairox/jaxmarl.
- Melting Pot 2.0. arXiv preprint arXiv:2211.13746 (2022).
- The hanabi challenge: A new frontier for ai research. Artificial Intelligence 280 (2020), 103216.
- The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research 47 (jun 2013), 253–279.
- The complexity of decentralized control of Markov decision processes. Mathematics of operations research 27, 4 (2002), 819–840.
- VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning. The 16th International Symposium on Distributed Autonomous Robotic Systems (2022).
- Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX. arXiv:2306.09884 [cs.LG] https://arxiv.org/abs/2306.09884
- JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
- OpenAI Gym. arXiv:arXiv:1606.01540
- On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems 32 (2019).
- Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS).
- Steven Dalton and iuri frosio. 2020. Accelerating Reinforcement Learning through GPU Atari Emulation. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 19773–19782. https://proceedings.neurips.cc/paper/2020/file/e4d78a6b4d93e1d79241f7b282fa3413-Paper.pdf
- Gymnasium Robotics. http://github.com/Farama-Foundation/Gymnasium-Robotics
- Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? https://doi.org/10.48550/arXiv.2011.09533 arXiv:2011.09533 [cs].
- SMACv2: An improved benchmark for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2212.07489 (2022).
- Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/c7635bfd99248a2cdef8249ef7bfbef4-Paper.pdf
- Learning with Opponent-Learning Awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 122–130.
- Stabilising experience replay for deep multi-agent reinforcement learning. In International conference on machine learning. PMLR, 1146–1155.
- Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation. http://github.com/google/brax
- JAX-LOB: A GPU-Accelerated limit order book simulator to unlock large scale reinforcement learning for trading. arXiv preprint arXiv:2308.13289 (2023).
- Towards a Standardised Performance Evaluation Protocol for Cooperative MARL. arXiv preprint arXiv:2209.10485 (2022).
- Hengyuan Hu and Jakob N Foerster. 2020. Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=B1xm3RVtwB
- “other-play” for zero-shot coordination. In International Conference on Machine Learning. PMLR, 4399–4410.
- MARLlib: Extending RLlib for Multi-agent Reinforcement Learning. arXiv preprint arXiv:2210.13708 (2022).
- CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Journal of Machine Learning Research 23, 274 (2022), 1–18. http://jmlr.org/papers/v23/21-1342.html
- Roberto Ierusalimschy. 2006. Programming in lua. Roberto Ierusalimschy.
- Context and History Aware Other-Shaping. (2022).
- Pgx: Hardware-accelerated Parallel Game Simulators for Reinforcement Learning. arXiv preprint arXiv:2303.17503 (2023).
- Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251 (2021).
- Robert Tjarko Lange. 2022. gymnax: A JAX-based Reinforcement Learning Environment Library. http://github.com/RobertTLange/gymnax
- Who Needs to Know? Minimal Knowledge for Optimal Coordination. In International Conference on Machine Learning. PMLR, 18599–18613.
- RLlib: Abstractions for distributed reinforcement learning. In International conference on machine learning. PMLR, 3053–3062.
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Neural Information Processing Systems (NIPS) (2017).
- Discovered policy optimisation. Advances in Neural Information Processing Systems 35 (2022), 16455–16468.
- Structured state space models for in-context reinforcement learning. arXiv preprint arXiv:2303.03982 (2023).
- Model-Free Opponent Shaping. In International Conference on Machine Learning. PMLR, 14398–14411.
- Adversarial cheap talk. In International Conference on Machine Learning. PMLR, 22917–22941.
- Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning.
- SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2305.05566 (2023).
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928–1937.
- Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
- POPGym: Benchmarking Partially Observable Reinforcement Learning. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=chDrutUTs0K
- Behaviour suite for reinforcement learning. arXiv preprint arXiv:1908.03568 (2019).
- Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS). http://arxiv.org/abs/2006.07869
- Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems 34 (2021), 12208–12221.
- Mava: A Research Framework for Distributed Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2107.01460 (2021). https://arxiv.org/pdf/2107.01460.pdf
- Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research 21, 1 (2020), 7234–7284.
- Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning. PMLR, 4295–4304.
- Amit Sabne. 2020. XLA : Compiling Machine Learning for Peak Performance.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043 (2019).
- Mastering atari, go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
- An Extensible, Data-Oriented Architecture for High-Performance, Many-World Simulation. ACM Trans. Graph. 42, 4 (2023).
- Glenn H Snyder. 1971. ” Prisoner’s Dilemma” and” Chicken” Models in International Politics. International Studies Quarterly 15, 1 (1971), 66–103.
- Trust region bounds for decentralized ppo under non-stationarity. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 5–13.
- Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017).
- Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, (AAMAS 2018), Vol. 3. 2085–2087.
- Multiagent cooperation and competition with deep reinforcement learning. PloS one 12, 4 (2017), e0172395.
- Pettingzoo: Gym for multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 15032–15043.
- MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026–5033. https://doi.org/10.1109/IROS.2012.6386109
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
- EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 22409–22421. https://proceedings.neurips.cc/paper_files/paper/2022/file/8caaf08e49ddbad6694fae067442ee21-Paper-Datasets_and_Benchmarks.pdf
- Kenny Young and Tian Tian. 2019. MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments. arXiv preprint arXiv:1903.03176 (2019).
- The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems 35 (2022), 24611–24624.
- Centralized Model and Exploration Policy for Multi-Agent RL. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 1500–1508.
- Proximal Learning With Opponent-Learning Awareness. Advances in Neural Information Processing Systems 35 (2022), 26324–26336.
- MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning. Journal of Machine Learning Research 24, 150 (2023), 1–12. http://jmlr.org/papers/v24/22-0169.html
- Alexander Rutherford (3 papers)
- Benjamin Ellis (12 papers)
- Matteo Gallici (6 papers)
- Jonathan Cook (9 papers)
- Andrei Lupu (14 papers)
- Timon Willi (13 papers)
- Akbir Khan (17 papers)
- Christian Schroeder de Witt (49 papers)
- Alexandra Souly (6 papers)
- Saptarashmi Bandyopadhyay (7 papers)
- Mikayel Samvelyan (22 papers)
- Minqi Jiang (31 papers)
- Robert Tjarko Lange (21 papers)
- Shimon Whiteson (122 papers)
- Bruno Lacerda (19 papers)
- Nick Hawes (38 papers)
- Chris Lu (33 papers)
- Jakob Nicolaus Foerster (15 papers)
- Gardar Ingvarsson (1 paper)
- Ravi Hammond (4 papers)