minimax: Efficient Baselines for Autocurricula in JAX (2311.12716v3)

Published 21 Nov 2023 in cs.LG and cs.AI

Abstract: Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obstacle to rapid innovation for the field. This work introduces the minimax library for UED training on accelerated hardware. Using JAX to implement fully-tensorized environments and autocurriculum algorithms, minimax allows the entire training loop to be compiled for hardware acceleration. To provide a petri dish for rapid experimentation, minimax includes a tensorized grid-world based on MiniGrid, in addition to reusable abstractions for conducting autocurricula in procedurally-generated environments. With these components, minimax provides strong UED baselines, including new parallelized variants, which achieve over 120$\times$ speedups in wall time compared to previous implementations when training with equal batch sizes. The minimax library is available under the Apache 2.0 license at https://github.com/facebookresearch/minimax.

Citations (8)

View on Semantic Scholar

Summary

The paper introduces the minimax library that cuts UED experiment times by over 120x using JAX’s efficient parallelization techniques.
The authors demonstrate a modular design with components like tensorized grid-world environments that streamline reinforcement learning experimentation.
The library’s fully parallelized multi-device training broadens research accessibility by reducing computational barriers in unsupervised environment design.

Analyzing "minimax: Efficient Baselines for Autocurricula in JAX"

The paper "minimax: Efficient Baselines for Autocurricula in JAX" by Minqi Jiang et al. presents a significant contribution to the field of unsupervised environment design (UED) within reinforcement learning (RL) through the introduction of the minimax library. This library addresses a crucial challenge in the area: the prohibitive computational demands of existing UED implementations, which constitute a barrier to rapid experimentation and innovation.

Overview of Unsupervised Environment Design

Unsupervised Environment Design (UED) leverages automatic curriculum learning to train robust decision-making agents capable of generalizing to unseen environments. This approach involves autocurricula, which are self-organizing curricula that emerge naturally as agents interact and adapt. The UED paradigm extends these ideas to facilitate the design of training tasks themselves. Typically, this involves a student's interaction with a dynamically adjustable training environment or adversarial teacher, aiming to maximize the student's robustness and transfer capabilities.

The Need for Efficient Baselines

Prior to minimax, existing UED experimentation, primarily conducted using PyTorch models and conventional CPU and GPU hardware, incurred substantial computational costs and lengthy training times—often stretching over several weeks. To address these inefficiencies and accelerate research in UED, the authors propose the minimax library. Implemented using JAX, minimax takes advantage of hardware acceleration to significantly reduce the wall-clock time required for UED experiments, achieving speedups of more than 120 times compared to previous baselines.

Key Features of the Minimax Library

Minimax's design emphasizes modularity, enabling flexible experimentation with various RL setups. The core components—environments, agents, models, and curriculum runners—are built to operate under a streamlined, interdependent architecture. The library features a tensorized grid-world environment, AMaze, tailored specifically for UED experimentation, which replicates existing benchmarks while providing a fast execution environment for training RL agents under unsupervised setups.

One highlight of the minimax library is its capability to conduct fully parallelized and multi-device training, which has resulted in significant performance improvements. The library incorporates fundamental UED methods, such as PAIRED and Prioritized Level Replay (PLR), and introduces novel enhancements like Parallel PLR and ACCEL, achieving substantial speed improvements without sacrificing task performance.

Numerical Results and Performance

The experiments conducted demonstrate staggering improvements in computational efficiency. For instance, tasks that inherently took over 100 hours can be completed in under 3 hours using minimax's parallelized implementations. Importantly, the performance of minimax's implementations matches or exceeds that of its predecessors in benchmark tasks, highlighting that the transition to JAX-based implementations did not compromise learning outcomes.

Implications for Future Research

The minimax library paves the way for more accessible research and development within the UED domain by drastically reducing computational requirements. This increased efficiency enables broader participation from research groups globally, regardless of computational resource constraints.

The modular and flexible design of minimax implies that it can serve as a fundamental tool not only for UED but extends to broader applications requiring dynamic policy adaptation, such as complex multi-agent environments. Future developments could explore the integration of cutting-edge state-space models, like S5 utilized in this paper, for RL policy optimization, potentially leading to further advancements in RL agent performance under partial observability and task diversity.

Ultimately, minimax represents a pivotal step towards democratizing the research process in UED, providing a robust platform upon which theorists and practitioners alike can experiment with novel methodologies and drive forward the capabilities of artificial agents in unsupervised learning contexts.

PDF Markdown

Related Papers

GitHub

GitHub - facebookresearch/minimax: Efficient baselines for autocurricula in JAX. (190 stars)