Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning (2402.16801v2)

Published 26 Feb 2024 in cs.LG

Abstract: Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.

Evaluating Open-Ended Reinforcement Learning with Craftax

Introduction to Craftax

The reinforcement learning (RL) field has seen significant strides forward, particularly through the development and application of benchmarks to evaluate the performance and capabilities of RL algorithms. However, a challenge persists in balancing the complexity and computational demands of these benchmarks. Craftax emerges as a solution, offering a sophisticated yet computationally accessible environment for open-ended reinforcement learning research. This post explores Craftax and its implications for future AI developments.

Addressing the Gap in Existing Benchmarks

Current benchmarks in the field of RL swing between two extremes: they are either computationally intensive, restricting access to those with significant resources, or they lack in complexity, offering little challenge to state-of-the-art methods. Craftax introduces two versions: Craftax-Classic, a remake of Crafter with substantial runtime improvements, and the main Craftax benchmark, which extends the complexity significantly while maintaining computational feasibility. The main Craftax benchmark incorporates mechanics inspired by the game NetHack, demanding deeper exploration, long-term planning, memory, and continual adaptation from the algorithms it tests.

Craftax-Classic: A Faster Crafter

Craftax-Classic emerges as a speed-optimized version of Crafter, running up to 250 times faster due to its implementation in JAX. This enhancement allows for extensive experimentation with a billion environment interactions completing in under an hour using a single GPU, demonstrating that PPO agents can achieve close to optimal performance with vastly reduced computational requirements.

The Main Craftax Benchmark

Building on the foundation of Craftax-Classic, the main Craftax benchmark introduces more intricate game mechanics, including multiple floors with unique challenges, an overhauled combat system, new creatures, and mechanics like potions, enchantments, and attributes. These features necessitate advanced strategies and in-context learning from agents, pushing the boundaries of current RL algorithms. Preliminary results indicate existing methods struggle to advance in this environment, underscoring the challenge Craftax represents.

Technical Insights and Initial Results

  • Exploration Baselines: Initial investigations utilized PPO along with exploration enhancements like the Intrinsic Curiosity Module and Exploration via Elliptical Episodic Bonuses. Results showed limited progress beyond basic to intermediate achievements within the Craftax world, suggesting room for significant advancement in exploration strategies.
  • UED Baselines: The application of Unsupervised Environment Design (UED) methods like PLR and ACCEL showed that while some improvement could be observed, the challenge posed by Craftax remains formidable, with no method achieving marked success on more difficult achievements.

Implications and Future Directions

Craftax sets a new standard for evaluating and advancing RL algorithms. Its complexity coupled with computational accessibility enables broader participation in cutting-edge AI research. The environment's depth and breadth suggest it could drive significant progress in areas critical to the development of general artificial intelligence, such as deep exploration, skill acquisition, generalization, and long-term strategic planning. As researchers begin to tackle the challenges Craftax presents, we can anticipate novel approaches and methodologies emerging, pushing the frontiers of what's achievable in open-ended RL.

Conclusion

Craftax is a pivotal development in the pursuit of more sophisticated and capable reinforcement learning algorithms. By balancing computational efficiency with complex, open-ended dynamics, it offers a new arena for researchers to test the limits of current RL methodologies and innovate towards the next generation of AI capabilities. The journey to mastering Craftax is just beginning, and the insights gained will undoubtedly influence the course of AI research in the years to come.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020.
  2. Barto, A. G. Intrinsic motivation and reinforcement learning. Intrinsically motivated learning in natural and artificial systems, pp.  17–47, 2013.
  3. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
  4. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  5. Jumanji: a diverse suite of scalable reinforcement learning environments in jax, 2023. URL https://arxiv.org/abs/2306.09884.
  6. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  7. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  8. Exploration by random network distillation. CoRR, abs/1810.12894, 2018. URL http://arxiv.org/abs/1810.12894.
  9. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
  10. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
  11. Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning, pp.  2048–2056. PMLR, 2020.
  12. Emergent complexity and zero-shot transfer via unsupervised environment design. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/985e9a46e10005356bbaf194249f6856-Abstract.html.
  13. First return, then explore. Nature, 590(7847):580–586, 2021.
  14. Smacv2: An improved benchmark for cooperative multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
  15. Brax – a differentiable physics engine for large scale rigid body simulation, 2021. URL https://arxiv.org/abs/2106.13281.
  16. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp.  1861–1870. PMLR, 2018.
  17. Hafner, D. Benchmarking the spectrum of agent capabilities. In International Conference on Learning Representations, 2021.
  18. Hambro, E. et al. Insights from the neurips 2021 nethack challenge. In NeurIPS 2021 Competitions and Demonstrations Track, pp.  41–52. PMLR, 2022.
  19. Exploration via elliptical episodic bonuses. Advances in Neural Information Processing Systems, 35:37631–37646, 2022.
  20. A study of global and episodic bonuses for exploration in contextual mdps. arXiv preprint arXiv:2306.03236, 2023.
  21. Jakobi, N. Evolutionary robotics and the radical envelope-of-noise hypothesis. Adaptive behavior, 6(2):325–368, 1997.
  22. Replay-guided adversarial environment design. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, pp.  1884–1897, 2021a. URL https://proceedings.neurips.cc/paper/2021/hash/0e915db6326b6fb6a3c56546980a8c93-Abstract.html.
  23. Prioritized level replay. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  4940–4950. PMLR, 2021b. URL http://proceedings.mlr.press/v139/jiang21b.html.
  24. Grounding aleatoric uncertainty for unsupervised environment design. Advances in Neural Information Processing Systems, 35:32868–32881, 2022.
  25. minimax: Efficient baselines for autocurricula in jax. In Agent Learning in Open-Endedness Workshop at NeurIPS, 2023.
  26. The malmo platform for artificial intelligence experimentation. In Ijcai, pp.  4246–4247, 2016.
  27. Pgx: Hardware-accelerated parallel game simulators for reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  28. The nethack learning environment. Advances in Neural Information Processing Systems, 33:7671–7684, 2020.
  29. Lange, R. T. gymnax: A JAX-based reinforcement learning environment library, 2022. URL http://github.com/RobertTLange/gymnax.
  30. Exploiting open-endedness to solve problems through the search for novelty. Artificial Life - ALIFE, 01 2008.
  31. Discovered policy optimisation. Advances in Neural Information Processing Systems, 35:16455–16468, 2022.
  32. Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research, 61:523–562, 2018.
  33. XLand-minigrid: Scalable meta-reinforcement learning environments in JAX. In Intrinsically-Motivated and Open-Ended Learning Workshop, NeurIPS2023, 2023. URL https://openreview.net/forum?id=xALDC4aHGz.
  34. Open Ended Learning Team et al. Open-ended learning leads to generally capable agents, 2021. URL https://arxiv.org/abs/2107.12808.
  35. Behaviour suite for reinforcement learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rygf-kSYwH.
  36. Intrinsic motivation systems for autonomous mental development. IEEE transactions on evolutionary computation, 11(2):265–286, 2007.
  37. Evolving curricula with regret-based environment design. In Proceedings of the International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  17473–17498. PMLR, 2022. URL https://proceedings.mlr.press/v162/parker-holder22a.html.
  38. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
  39. Jaxmarl: Multi-agent rl environments in jax. arXiv preprint arXiv:2311.10090, 2023.
  40. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
  41. Minihack the planet: A sandbox for open-ended reinforcement learning research, 2021. URL https://arxiv.org/abs/2109.13202.
  42. Schmidhuber, J. A possibility for implementing curiosity and boredom in model-building neural controllers. In Proc. of the international conference on simulation of adaptive behavior: From animals to animats, pp.  222–227, 1991.
  43. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  44. Open-endedness: The last grand challenge you’ve never heard of. While open-endedness could be a force for discovering intelligence, it could also be a component of AI itself, 2017.
  45. Intrinsic motivation and automatic curricula via asymmetric self-play. In 6th International Conference on Learning Representations, ICLR 2018, 2018. URL https://openreview.net/forum?id=SkT5Yg-RZ.
  46. Domain randomization for transferring deep neural networks from simulation to the real world. In International Conference on Intelligent Robots and Systems, pp.  23–30. IEEE, 2017. doi: 10.1109/IROS.2017.8202133. URL https://doi.org/10.1109/IROS.2017.8202133.
  47. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp.  5026–5033. IEEE, 2012.
  48. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753, 2019.
  49. How does learning rate decay help modern neural networks? arXiv preprint arXiv:1908.01878, 2019.
  50. Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments. arXiv preprint arXiv:1903.03176, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Michael Matthews (6 papers)
  2. Michael Beukman (19 papers)
  3. Benjamin Ellis (12 papers)
  4. Mikayel Samvelyan (22 papers)
  5. Matthew Jackson (6 papers)
  6. Samuel Coward (20 papers)
  7. Jakob Foerster (100 papers)
Citations (16)
Github Logo Streamline Icon: https://streamlinehq.com

HackerNews