marl-jax: Multi-Agent Reinforcement Leaning Framework (2303.13808v2)

Published 24 Mar 2023 in cs.MA and cs.LG

Abstract: Recent advances in Reinforcement Learning (RL) have led to many exciting applications. These advancements have been driven by improvements in both algorithms and engineering, which have resulted in faster training of RL agents. We present marl-jax, a multi-agent reinforcement learning software package for training and evaluating social generalization of the agents. The package is designed for training a population of agents in multi-agent environments and evaluating their ability to generalize to diverse background agents. It is built on top of DeepMind's JAX ecosystem~\cite{deepmind2020jax} and leverages the RL ecosystem developed by DeepMind. Our framework marl-jax is capable of working in cooperative and competitive, simultaneous-acting environments with multiple agents. The package offers an intuitive and user-friendly command-line interface for training a population and evaluating its generalization capabilities. In conclusion, marl-jax provides a valuable resource for researchers interested in exploring social generalization in the context of MARL. The open-source code for marl-jax is available at: \href{https://github.com/kinalmehta/marl-jax}{https://github.com/kinalmehta/marl-jax}

Authors (3)

Kinal Mehta (6 papers)
Anuj Mahajan (18 papers)
Pawan Kumar (173 papers)

Citations (3)

View on Semantic Scholar

Summary

The paper introduces marl-jax, a robust multi-agent reinforcement learning framework focused on zero-shot generalization against novel partners.
It leverages JAX's vectorization and just-in-time compilation to support high-performance training across single-threaded and distributed architectures.
The framework is validated in complex settings like Overcooked and Melting Pot, demonstrating effectiveness in both cooperative and competitive multi-agent scenarios.

An Overview of marl-jax: Multi-agent Reinforcement Learning Framework

The paper presents marl-jax, a robust software framework tailored for multi-agent reinforcement learning (MARL) primarily designed to address the challenges of training and evaluating agents' capacities for social generalization in diverse environments. Built atop the DeepMind's JAX ecosystem, marl-jax provides an efficient and highly optimized platform to facilitate training in both competitive and cooperative multi-agent settings. The distinctive feature of marl-jax is its explicit emphasis on zero-shot generalization, enabling agents to collaborate with or compete against novel partners that were not encountered during the training phase.

Framework Design and Architecture

The marl-jax framework integrates several key features from existing reinforcement learning (RL) ecosystems, including autograd, vectorization, parallelization capabilities like pmap, and just-in-time compilation offered by JAX. These design choices foster high-performance training regimens for MARL applications. The framework is distinguished by four primary training architectures: single-threaded, synchronous distributed, IMPALA-style asynchronous distributed, and a Sebulba-inspired asynchronous distributed model with a dedicated inference server. Each architecture optimizes the balance between computational efficiency and the robustness of agent training.

Supported Environments

The researchers have configured marl-jax to support a variety of environments well-suited for evaluating MARL algorithms, including the Overcooked and Melting Pot environment suites. These environments challenge agents with intricate scenarios like cooperative cooking tasks and complex social dilemmas, thereby serving as rigorous benchmarks for assessing the agents' capabilities to generalize across different interaction dynamics.

Algorithmic Solutions

Within the marl-jax framework, two principal algorithms have been implemented. The first is the Actor-Critic Baseline, a standard methodology harnessing V-trace for off-policy corrections. The second is Options as Responses (OPRE), a sophisticated algorithm calibrated for generalization in novel partner settings, marking marl-jax as a pioneering initiative in open-sourcing OPRE implementations.

Practical Utilities and Prospects

The utility toolkit included in marl-jax provides comprehensive tools for both training (train.py) and evaluation (evaluate.py) of agent populations, streamlining the experimental workflows for researchers and promoting greater accessibility within the MARL research milieu. As a forward-looking project, marl-jax intends not only to enrich its algorithmic diversity but also to refine its frameworks in line with novel research paradigms.

Evaluation and Impact

Throughout rigorous evaluations, notably in environments like Prisoners' Dilemma and Running with Scissors, marl-jax's implementations demonstrate meaningful competencies. For instance, testing algorithms like IMPALA and OPRE across a variety of scenarios offers a glimpse into their performance metrics, facilitating a deeper understanding of how respective strategies manifest within complex cooperative and competitive ecosystems.

Conclusion and Future Directions

Marl-jax emerges as a vital contribution to MARL research, offering an optimized, open-source framework addressing both practical algorithm application and theoretical generalization issues. Future work with marl-jax is expected to amplify MARL capabilities through the adoption and integration of advanced algorithmic techniques, thus continually shaping the trajectory of autonomous agent research in heterogeneous environments. As the field of MARL continues to evolve, marl-jax is poised to serve as a cornerstone facilitating both the depth and breadth of future explorations into agent-based learning paradigms.

PDF Markdown

Related Papers

GitHub

GitHub - kinalmehta/marl-jax: JAX library for MARL research (88 stars)

Tweets

https://twitter.com/pawaniiit/status/1746638810379853970