Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning (2402.16801v2)

Published 26 Feb 2024 in cs.LG

Abstract: Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.

PDF HTML Abstract

Evaluating Open-Ended Reinforcement Learning with Craftax

Introduction to Craftax

The reinforcement learning (RL) field has seen significant strides forward, particularly through the development and application of benchmarks to evaluate the performance and capabilities of RL algorithms. However, a challenge persists in balancing the complexity and computational demands of these benchmarks. Craftax emerges as a solution, offering a sophisticated yet computationally accessible environment for open-ended reinforcement learning research. This post explores Craftax and its implications for future AI developments.

Addressing the Gap in Existing Benchmarks

Current benchmarks in the field of RL swing between two extremes: they are either computationally intensive, restricting access to those with significant resources, or they lack in complexity, offering little challenge to state-of-the-art methods. Craftax introduces two versions: Craftax-Classic, a remake of Crafter with substantial runtime improvements, and the main Craftax benchmark, which extends the complexity significantly while maintaining computational feasibility. The main Craftax benchmark incorporates mechanics inspired by the game NetHack, demanding deeper exploration, long-term planning, memory, and continual adaptation from the algorithms it tests.

Craftax-Classic: A Faster Crafter

Craftax-Classic emerges as a speed-optimized version of Crafter, running up to 250 times faster due to its implementation in JAX. This enhancement allows for extensive experimentation with a billion environment interactions completing in under an hour using a single GPU, demonstrating that PPO agents can achieve close to optimal performance with vastly reduced computational requirements.

The Main Craftax Benchmark

Building on the foundation of Craftax-Classic, the main Craftax benchmark introduces more intricate game mechanics, including multiple floors with unique challenges, an overhauled combat system, new creatures, and mechanics like potions, enchantments, and attributes. These features necessitate advanced strategies and in-context learning from agents, pushing the boundaries of current RL algorithms. Preliminary results indicate existing methods struggle to advance in this environment, underscoring the challenge Craftax represents.

Technical Insights and Initial Results

Exploration Baselines: Initial investigations utilized PPO along with exploration enhancements like the Intrinsic Curiosity Module and Exploration via Elliptical Episodic Bonuses. Results showed limited progress beyond basic to intermediate achievements within the Craftax world, suggesting room for significant advancement in exploration strategies.
UED Baselines: The application of Unsupervised Environment Design (UED) methods like PLR and ACCEL showed that while some improvement could be observed, the challenge posed by Craftax remains formidable, with no method achieving marked success on more difficult achievements.

Implications and Future Directions

Craftax sets a new standard for evaluating and advancing RL algorithms. Its complexity coupled with computational accessibility enables broader participation in cutting-edge AI research. The environment's depth and breadth suggest it could drive significant progress in areas critical to the development of general artificial intelligence, such as deep exploration, skill acquisition, generalization, and long-term strategic planning. As researchers begin to tackle the challenges Craftax presents, we can anticipate novel approaches and methodologies emerging, pushing the frontiers of what's achievable in open-ended RL.

Conclusion

Craftax is a pivotal development in the pursuit of more sophisticated and capable reinforcement learning algorithms. By balancing computational efficiency with complex, open-ended dynamics, it offers a new arena for researchers to test the limits of current RL methodologies and innovate towards the next generation of AI capabilities. The journey to mastering Craftax is just beginning, and the insights gained will undoubtedly influence the course of AI research in the years to come.