Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mava: a research library for distributed multi-agent reinforcement learning in JAX

Published 3 Jul 2021 in cs.LG and cs.MA | (2107.01460v2)

Abstract: Multi-agent reinforcement learning (MARL) research is inherently computationally expensive and it is often difficult to obtain a sufficient number of experiment samples to test hypotheses and make robust statistical claims. Furthermore, MARL algorithms are typically complex in their design and can be tricky to implement correctly. These aspects of MARL present a difficult challenge when it comes to creating useful software for advanced research. Our criteria for such software is that it should be simple enough to use to implement new ideas quickly, while at the same time be scalable and fast enough to test those ideas in a reasonable amount of time. In this preliminary technical report, we introduce Mava, a research library for MARL written purely in JAX, that aims to fulfill these criteria. We discuss the design and core features of Mava, and demonstrate its use and performance across a variety of environments. In particular, we show Mava's substantial speed advantage, with improvements of 10-100x compared to other popular MARL frameworks, while maintaining strong performance. This allows for researchers to test ideas in a few minutes instead of several hours. Finally, Mava forms part of an ecosystem of libraries that seamlessly integrate with each other to help facilitate advanced research in MARL. We hope Mava will benefit the community and help drive scientifically sound and statistically robust research in the field. The open-source repository for Mava is available at https://github.com/instadeepai/Mava.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320, 2021.
  2. A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. arXiv preprint arXiv:1506.01170, 2015.
  3. Efficient online reinforcement learning with offline data. In International Conference on Machine Learning. PMLR, 2023.
  4. Benchmarl: Benchmarking multi-agent reinforcement learning. arXiv preprint arXiv:2312.01472, 2023.
  5. Jumanji: a diverse suite of scalable reinforcement learning environments in jax, 2023. URL https://arxiv.org/abs/2306.09884.
  6. JAX: composable transformations of Python+NumPy programs, 2023. URL http://github.com/google/jax.
  7. On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems, 32, 2019.
  8. Shared experience actor-critic for multi-agent reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  9. Off-the-grid marl: Datasets and baselines for offline multi-agent reinforcement learning. In Extended Abstract at the 2023 International Conference on Autonomous Agents and Multiagent Systems. AAMAS, 2023a.
  10. Reduce, reuse, recycle: Selective reincarnation in multi-agent reinforcement learning. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023b.
  11. Brax - a differentiable physics engine for large scale rigid body simulation, 2021. URL http://github.com/google/brax.
  12. Towards a standardised performance evaluation protocol for cooperative marl. Advances in Neural Information Processing Systems, 35:5510–5521, 2022.
  13. Podracer architectures for scalable reinforcement learning. arXiv preprint arXiv:2104.06272, 2021.
  14. Marllib: A scalable and efficient multi-agent reinforcement learning library. Journal of Machine Learning Research, 2023.
  15. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research, 23(274):1–18, 2022.
  16. Robert Tjarko Lange. gymnax: A JAX-based reinforcement learning environment library, 2022. URL http://github.com/RobertTLange/gymnax.
  17. Discovered policy optimisation. Advances in Neural Information Processing Systems, 35:16455–16468, 2022.
  18. Offline pre-trained multi-agent decision transformer. Machine Intelligence Research, 20(2):233–248, 2023.
  19. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
  20. Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS), 2021. URL http://arxiv.org/abs/2006.07869.
  21. Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 34:12208–12221, 2021.
  22. Arnu Pretorius. Matrax: Matrix games in jax, 2023. URL http://github.com/instadeepai/matrax.
  23. Jaxmarl: Multi-agent rl environments in jax. arXiv preprint arXiv:2311.10090, 2023.
  24. The StarCraft Multi-Agent Challenge. CoRR, abs/1902.04043, 2019a.
  25. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019b.
  26. Learning from good trajectories in offline multi-agent reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11672–11680, 2023.
  27. Flashbax: Streamlining experience replay buffers for reinforcement learning with jax, 2023. URL https://github.com/instadeepai/flashbax/.
  28. Offline multi-agent reinforcement learning with knowledge distillation. In Advances in Neural Information Processing Systems, volume 35, pages 226–237, 2022.
  29. Leveraging offline data in online reinforcement learning. In International Conference on Machine Learning, pages 35300–35338. PMLR, 2023.
  30. Offline multi-agent reinforcement learning with implicit global-to-local value regularization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  31. Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra.
  32. Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34:10299–10312, 2021.
Citations (12)

Summary

  • The paper presents Mava, which speeds up MARL research with runtime improvements of up to 100x.
  • It leverages JAX's just-in-time compilation and streamlined code design to simplify debugging and rapid experimentation.
  • Mava integrates seamlessly with JAX-native environments and evaluation tools, supporting both online and offline multi-agent training.

Mava: A JAX-Based Framework for Distributed Multi-Agent Reinforcement Learning

The paper introduces Mava, a research library developed for distributed multi-agent reinforcement learning (MARL) within the JAX framework. Mava is designed to address the inherent complexities and computational demands of MARL research by offering a tool that is both scalable and user-friendly. The library facilitates rapid experimentation by providing substantial speed advantages, ranging from 10 to 100 times faster than existing frameworks, thus significantly decreasing the time required for testing novel hypotheses.

Key Features and Design

Mava distinguishes itself by adopting a clean code philosophy which, unlike fully modular frameworks, centralizes algorithmic logic in a single, straightforward file. This design choice enables researchers to efficiently debug, adapt, and implement new ideas without the overhead of excessive boilerplate code. Mava employs JAX's just-in-time compilation capabilities, particularly through the use of the Anakin architecture, to implement scalable and efficient distributed training on hardware accelerators.

The library supports environments written in JAX, crucial for leveraging its capabilities to optimize the performance of both recurrent and feedforward policy implementations, including multi-agent PPO configurations that follow DTDE and CTDE paradigms. Mava provides comprehensive support for multi-device training, offering efficient checkpointing and logging mechanisms that include native integration with popular tools such as Tensorboard and Neptune.

Integration with Broader Ecosystem

Mava is designed to function seamlessly within an evolving MARL ecosystem. This integration allows it to work effectively alongside other libraries, such as:

  • Jumanji, Matrax, and JaxMARL: These provide JAX-native multi-agent environments, improving interaction speed due to optimized implementations on hardware accelerators.
  • OG-MARL: Mava's compatibility with this library enables offline MARL experimentation, showcasing the synergy between online and offline training methods.
  • MARL-eval: The library allows for standardized and statistically rigorous evaluation reporting, improving the reliability of performance analyses across MARL systems.

Evaluation and Performance

The paper details experimental benchmarks comparing Mava to existing MARL frameworks, notably EPyMARL and JaxMARL. When tested across diverse scenarios like Level-Based Foraging (LBF), Multi-Robot Warehouse (RWARE), and the StarCraft Multi-Agent Challenge (SMAX), Mava demonstrated competitive or superior performance with significantly reduced wallclock time. Notably, its ability to scale efficiently on advanced hardware, such as TPUs, underscores its practicality for researchers needing rapid experimentation.

Implications and Future Directions

Mava's introduction as a tool for MARL research has several key implications:

  1. Scalability and Flexibility: Its architecture and integration capabilities make it a highly flexible choice for MARL research, allowing seamless transition and interaction with existing JAX-based environments.
  2. Experimentation Efficiency: The significant speed gains and usability improvements it offers can accelerate the development and testing of complex MARL algorithms.
  3. Offline-Online Synergy: The support for offline MARL research opens avenues for more efficient use of training data, potentially reducing the reliance on expensive online interactions.

Looking forward, the continued evolution of the MARL ecosystem, coupled with enhancements to Mava’s capabilities, is poised to catalyze further advancements in the field. Future developments may involve incorporating more sophisticated algorithms, refining integration with additional environments, and expanding support for diverse hardware configurations.

In conclusion, Mava represents a robust, efficient, and versatile addition to the MARL research landscape, offering tools that align well with contemporary needs in computational research. This positions it as a valuable resource to drive forward statistically robust and computationally efficient MARL studies.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 15 likes about this paper.