Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

JaxMARL: Multi-Agent RL Environments and Algorithms in JAX (2311.10090v5)

Published 16 Nov 2023 in cs.LG, cs.AI, and cs.MA

Abstract: Benchmarks are crucial in the development of machine learning algorithms, with available environments significantly influencing reinforcement learning (RL) research. Traditionally, RL environments run on the CPU, which limits their scalability with typical academic compute. However, recent advancements in JAX have enabled the wider use of hardware acceleration, enabling massively parallel RL training pipelines and environments. While this has been successfully applied to single-agent RL, it has not yet been widely adopted for multi-agent scenarios. In this paper, we present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments and popular baseline algorithms. Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches, and up to 12500x when multiple training runs are vectorized. This enables efficient and thorough evaluations, potentially alleviating the evaluation crisis in the field. We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. The code is available at https://github.com/flairox/jaxmarl.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Melting Pot 2.0. arXiv preprint arXiv:2211.13746 (2022).
  2. The hanabi challenge: A new frontier for ai research. Artificial Intelligence 280 (2020), 103216.
  3. The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research 47 (jun 2013), 253–279.
  4. The complexity of decentralized control of Markov decision processes. Mathematics of operations research 27, 4 (2002), 819–840.
  5. VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning. The 16th International Symposium on Distributed Autonomous Robotic Systems (2022).
  6. Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX. arXiv:2306.09884 [cs.LG] https://arxiv.org/abs/2306.09884
  7. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
  8. OpenAI Gym. arXiv:arXiv:1606.01540
  9. On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems 32 (2019).
  10. Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS).
  11. Steven Dalton and iuri frosio. 2020. Accelerating Reinforcement Learning through GPU Atari Emulation. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 19773–19782. https://proceedings.neurips.cc/paper/2020/file/e4d78a6b4d93e1d79241f7b282fa3413-Paper.pdf
  12. Gymnasium Robotics. http://github.com/Farama-Foundation/Gymnasium-Robotics
  13. Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? https://doi.org/10.48550/arXiv.2011.09533 arXiv:2011.09533 [cs].
  14. SMACv2: An improved benchmark for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2212.07489 (2022).
  15. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/c7635bfd99248a2cdef8249ef7bfbef4-Paper.pdf
  16. Learning with Opponent-Learning Awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 122–130.
  17. Stabilising experience replay for deep multi-agent reinforcement learning. In International conference on machine learning. PMLR, 1146–1155.
  18. Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation. http://github.com/google/brax
  19. JAX-LOB: A GPU-Accelerated limit order book simulator to unlock large scale reinforcement learning for trading. arXiv preprint arXiv:2308.13289 (2023).
  20. Towards a Standardised Performance Evaluation Protocol for Cooperative MARL. arXiv preprint arXiv:2209.10485 (2022).
  21. Hengyuan Hu and Jakob N Foerster. 2020. Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=B1xm3RVtwB
  22. “other-play” for zero-shot coordination. In International Conference on Machine Learning. PMLR, 4399–4410.
  23. MARLlib: Extending RLlib for Multi-agent Reinforcement Learning. arXiv preprint arXiv:2210.13708 (2022).
  24. CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Journal of Machine Learning Research 23, 274 (2022), 1–18. http://jmlr.org/papers/v23/21-1342.html
  25. Roberto Ierusalimschy. 2006. Programming in lua. Roberto Ierusalimschy.
  26. Context and History Aware Other-Shaping. (2022).
  27. Pgx: Hardware-accelerated Parallel Game Simulators for Reinforcement Learning. arXiv preprint arXiv:2303.17503 (2023).
  28. Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251 (2021).
  29. Robert Tjarko Lange. 2022. gymnax: A JAX-based Reinforcement Learning Environment Library. http://github.com/RobertTLange/gymnax
  30. Who Needs to Know? Minimal Knowledge for Optimal Coordination. In International Conference on Machine Learning. PMLR, 18599–18613.
  31. RLlib: Abstractions for distributed reinforcement learning. In International conference on machine learning. PMLR, 3053–3062.
  32. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Neural Information Processing Systems (NIPS) (2017).
  33. Discovered policy optimisation. Advances in Neural Information Processing Systems 35 (2022), 16455–16468.
  34. Structured state space models for in-context reinforcement learning. arXiv preprint arXiv:2303.03982 (2023).
  35. Model-Free Opponent Shaping. In International Conference on Machine Learning. PMLR, 14398–14411.
  36. Adversarial cheap talk. In International Conference on Machine Learning. PMLR, 22917–22941.
  37. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning.
  38. SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2305.05566 (2023).
  39. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928–1937.
  40. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
  41. POPGym: Benchmarking Partially Observable Reinforcement Learning. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=chDrutUTs0K
  42. Behaviour suite for reinforcement learning. arXiv preprint arXiv:1908.03568 (2019).
  43. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS). http://arxiv.org/abs/2006.07869
  44. Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems 34 (2021), 12208–12221.
  45. Mava: A Research Framework for Distributed Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2107.01460 (2021). https://arxiv.org/pdf/2107.01460.pdf
  46. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research 21, 1 (2020), 7234–7284.
  47. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning. PMLR, 4295–4304.
  48. Amit Sabne. 2020. XLA : Compiling Machine Learning for Peak Performance.
  49. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043 (2019).
  50. Mastering atari, go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.
  51. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
  52. An Extensible, Data-Oriented Architecture for High-Performance, Many-World Simulation. ACM Trans. Graph. 42, 4 (2023).
  53. Glenn H Snyder. 1971. ” Prisoner’s Dilemma” and” Chicken” Models in International Politics. International Studies Quarterly 15, 1 (1971), 66–103.
  54. Trust region bounds for decentralized ppo under non-stationarity. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 5–13.
  55. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017).
  56. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, (AAMAS 2018), Vol. 3. 2085–2087.
  57. Multiagent cooperation and competition with deep reinforcement learning. PloS one 12, 4 (2017), e0172395.
  58. Pettingzoo: Gym for multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 15032–15043.
  59. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026–5033. https://doi.org/10.1109/IROS.2012.6386109
  60. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
  61. EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 22409–22421. https://proceedings.neurips.cc/paper_files/paper/2022/file/8caaf08e49ddbad6694fae067442ee21-Paper-Datasets_and_Benchmarks.pdf
  62. Kenny Young and Tian Tian. 2019. MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments. arXiv preprint arXiv:1903.03176 (2019).
  63. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems 35 (2022), 24611–24624.
  64. Centralized Model and Exploration Policy for Multi-Agent RL. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 1500–1508.
  65. Proximal Learning With Opponent-Learning Awareness. Advances in Neural Information Processing Systems 35 (2022), 26324–26336.
  66. MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning. Journal of Machine Learning Research 24, 150 (2023), 1–12. http://jmlr.org/papers/v24/22-0169.html
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (21)
  1. Alexander Rutherford (3 papers)
  2. Benjamin Ellis (12 papers)
  3. Matteo Gallici (6 papers)
  4. Jonathan Cook (9 papers)
  5. Andrei Lupu (14 papers)
  6. Timon Willi (13 papers)
  7. Akbir Khan (17 papers)
  8. Christian Schroeder de Witt (49 papers)
  9. Alexandra Souly (6 papers)
  10. Saptarashmi Bandyopadhyay (7 papers)
  11. Mikayel Samvelyan (22 papers)
  12. Minqi Jiang (31 papers)
  13. Robert Tjarko Lange (21 papers)
  14. Shimon Whiteson (122 papers)
  15. Bruno Lacerda (19 papers)
  16. Nick Hawes (38 papers)
  17. Chris Lu (33 papers)
  18. Jakob Nicolaus Foerster (15 papers)
  19. Gardar Ingvarsson (1 paper)
  20. Ravi Hammond (4 papers)
Citations (27)

Summary

Insights into JaxMARL: Multi-Agent Reinforcement Learning with JAX

The paper "JaxMARL: Multi-Agent RL Environments and Algorithms in JAX" introduces a comprehensive open-source library that brings together a wide array of multi-agent reinforcement learning (MARL) environments and algorithms in JAX. This library addresses several critical challenges faced by the MARL community, including computational inefficiencies and inconsistencies in evaluation standards.

Key Contributions

JaxMARL presents notable contributions to the MARL field:

  1. End-to-End GPU Acceleration: Leveraging JAX's ecosystem, JaxMARL optimizes both environment simulations and algorithmic computations on hardware accelerators. This capability is exemplified by experimental results showing up to a 12500x speedup compared to traditional CPU counterparts.
  2. Environment Diversity: The library incorporates various popular MARL environments, such as SMAX, Multi-Agent Particle Environments (MPE), and others, all unified under a single API. SMAX, in particular, offers a scalable alternative to existing platforms like SMAC, enabling flexible scenarios with efficient resource utilization.
  3. Algorithmic Implementation: JaxMARL provides JAX implementations of pivotal MARL techniques like Independent PPO (IPPO), QMIX, VDN, and IQL, enhancing both accessibility and performance for practitioners.

Numerical Findings and Implications

The paper reports impressive numerical results, notably the speed enhancements achieved through JAX. These advancements facilitate more comprehensive evaluations and rapid iteration cycles, reducing the computational barriers traditionally associated with MARL experiments. The speedup of up to 12500x for IPPO and 40000x for SMAX scenarios stands as a testament to the potential improvements in research efficiency that JaxMARL introduces.

Furthermore, these results suggest potential improvements in testing standards within the MARL community, enabling evaluations across a broader set of domains and significantly alleviating the risk of biased comparisons or incorrect conclusions, which have been prevalent in prior works.

Theoretical and Practical Implications

The theoretical impact of JaxMARL is manifold. By unifying training and environmental simulations under a scalable JAX-based framework, the library demonstrates the viability of massively parallel MARL training. This capability paves the way for researchers to explore more complex agent interactions and challenges at scale, such as meta-learning and self-play, without the prohibitively high computational costs usually attached.

Practically, the library's modular and clear design philosophy, inspired by frameworks like PettingZoo and Gymnax, ensures accessibility and adaptability for researchers, even those with limited resources. This ease of use furthers the adoption of hardware accelerators and large-scale parallelization in typical academic settings.

Future Developments

The emergence of JaxMARL signals substantial opportunities for future advancements in the MARL landscape. Potential developments could involve extending the library with more complex and realistic environments, further enhancing the robustness and breadth of the evaluation framework.

Moreover, explorations into other computational frameworks leveraging TPUs or AI-specific hardware could lead to even more significant computational efficiencies. Integrating JaxMARL with advanced automated hyperparameter tuning tools and population-based training strategies could yield notable benefits.

Conclusion

"JaxMARL: Multi-Agent RL Environments and Algorithms in JAX" stands as a pivotal work, amplifying the efficiency and quality of MARL research. By offering substantial performance enhancements and a consistent evaluation framework, JaxMARL presents a powerful toolkit for researchers seeking to address the intricate challenges of multi-agent systems. This initiative holds promise for cultivating a more innovative and effective research ecosystem in the field of reinforcement learning.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com