Mava: a research library for distributed multi-agent reinforcement learning in JAX

Published 3 Jul 2021 in cs.LG and cs.MA | (2107.01460v2)

Abstract: Multi-agent reinforcement learning (MARL) research is inherently computationally expensive and it is often difficult to obtain a sufficient number of experiment samples to test hypotheses and make robust statistical claims. Furthermore, MARL algorithms are typically complex in their design and can be tricky to implement correctly. These aspects of MARL present a difficult challenge when it comes to creating useful software for advanced research. Our criteria for such software is that it should be simple enough to use to implement new ideas quickly, while at the same time be scalable and fast enough to test those ideas in a reasonable amount of time. In this preliminary technical report, we introduce Mava, a research library for MARL written purely in JAX, that aims to fulfill these criteria. We discuss the design and core features of Mava, and demonstrate its use and performance across a variety of environments. In particular, we show Mava's substantial speed advantage, with improvements of 10-100x compared to other popular MARL frameworks, while maintaining strong performance. This allows for researchers to test ideas in a few minutes instead of several hours. Finally, Mava forms part of an ecosystem of libraries that seamlessly integrate with each other to help facilitate advanced research in MARL. We hope Mava will benefit the community and help drive scientifically sound and statistically robust research in the field. The open-source repository for Mava is available at https://github.com/instadeepai/Mava.

Abstract PDF HTML Upgrade to Chat

References (32)

Citations (12)

View on Semantic Scholar

Summary

The paper presents Mava, which speeds up MARL research with runtime improvements of up to 100x.
It leverages JAX's just-in-time compilation and streamlined code design to simplify debugging and rapid experimentation.
Mava integrates seamlessly with JAX-native environments and evaluation tools, supporting both online and offline multi-agent training.

Mava: A JAX-Based Framework for Distributed Multi-Agent Reinforcement Learning

The paper introduces Mava, a research library developed for distributed multi-agent reinforcement learning (MARL) within the JAX framework. Mava is designed to address the inherent complexities and computational demands of MARL research by offering a tool that is both scalable and user-friendly. The library facilitates rapid experimentation by providing substantial speed advantages, ranging from 10 to 100 times faster than existing frameworks, thus significantly decreasing the time required for testing novel hypotheses.

Key Features and Design

Mava distinguishes itself by adopting a clean code philosophy which, unlike fully modular frameworks, centralizes algorithmic logic in a single, straightforward file. This design choice enables researchers to efficiently debug, adapt, and implement new ideas without the overhead of excessive boilerplate code. Mava employs JAX's just-in-time compilation capabilities, particularly through the use of the Anakin architecture, to implement scalable and efficient distributed training on hardware accelerators.

The library supports environments written in JAX, crucial for leveraging its capabilities to optimize the performance of both recurrent and feedforward policy implementations, including multi-agent PPO configurations that follow DTDE and CTDE paradigms. Mava provides comprehensive support for multi-device training, offering efficient checkpointing and logging mechanisms that include native integration with popular tools such as Tensorboard and Neptune.

Integration with Broader Ecosystem

Mava is designed to function seamlessly within an evolving MARL ecosystem. This integration allows it to work effectively alongside other libraries, such as:

Jumanji, Matrax, and JaxMARL: These provide JAX-native multi-agent environments, improving interaction speed due to optimized implementations on hardware accelerators.
OG-MARL: Mava's compatibility with this library enables offline MARL experimentation, showcasing the synergy between online and offline training methods.
MARL-eval: The library allows for standardized and statistically rigorous evaluation reporting, improving the reliability of performance analyses across MARL systems.

Evaluation and Performance

The paper details experimental benchmarks comparing Mava to existing MARL frameworks, notably EPyMARL and JaxMARL. When tested across diverse scenarios like Level-Based Foraging (LBF), Multi-Robot Warehouse (RWARE), and the StarCraft Multi-Agent Challenge (SMAX), Mava demonstrated competitive or superior performance with significantly reduced wallclock time. Notably, its ability to scale efficiently on advanced hardware, such as TPUs, underscores its practicality for researchers needing rapid experimentation.

Implications and Future Directions

Mava's introduction as a tool for MARL research has several key implications:

Scalability and Flexibility: Its architecture and integration capabilities make it a highly flexible choice for MARL research, allowing seamless transition and interaction with existing JAX-based environments.
Experimentation Efficiency: The significant speed gains and usability improvements it offers can accelerate the development and testing of complex MARL algorithms.
Offline-Online Synergy: The support for offline MARL research opens avenues for more efficient use of training data, potentially reducing the reliance on expensive online interactions.

Looking forward, the continued evolution of the MARL ecosystem, coupled with enhancements to Mava’s capabilities, is poised to catalyze further advancements in the field. Future developments may involve incorporating more sophisticated algorithms, refining integration with additional environments, and expanding support for diverse hardware configurations.

In conclusion, Mava represents a robust, efficient, and versatile addition to the MARL research landscape, offering tools that align well with contemporary needs in computational research. This positions it as a valuable resource to drive forward statistically robust and computationally efficient MARL studies.

Markdown