Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning (2402.16801v2)
Abstract: Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.
- The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020.
- Barto, A. G. Intrinsic motivation and reinforcement learning. Intrinsically motivated learning in natural and artificial systems, pp. 17–47, 2013.
- Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Jumanji: a diverse suite of scalable reinforcement learning environments in jax, 2023. URL https://arxiv.org/abs/2306.09884.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Exploration by random network distillation. CoRR, abs/1810.12894, 2018. URL http://arxiv.org/abs/1810.12894.
- Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning, pp. 2048–2056. PMLR, 2020.
- Emergent complexity and zero-shot transfer via unsupervised environment design. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/985e9a46e10005356bbaf194249f6856-Abstract.html.
- First return, then explore. Nature, 590(7847):580–586, 2021.
- Smacv2: An improved benchmark for cooperative multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
- Brax – a differentiable physics engine for large scale rigid body simulation, 2021. URL https://arxiv.org/abs/2106.13281.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018.
- Hafner, D. Benchmarking the spectrum of agent capabilities. In International Conference on Learning Representations, 2021.
- Hambro, E. et al. Insights from the neurips 2021 nethack challenge. In NeurIPS 2021 Competitions and Demonstrations Track, pp. 41–52. PMLR, 2022.
- Exploration via elliptical episodic bonuses. Advances in Neural Information Processing Systems, 35:37631–37646, 2022.
- A study of global and episodic bonuses for exploration in contextual mdps. arXiv preprint arXiv:2306.03236, 2023.
- Jakobi, N. Evolutionary robotics and the radical envelope-of-noise hypothesis. Adaptive behavior, 6(2):325–368, 1997.
- Replay-guided adversarial environment design. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, pp. 1884–1897, 2021a. URL https://proceedings.neurips.cc/paper/2021/hash/0e915db6326b6fb6a3c56546980a8c93-Abstract.html.
- Prioritized level replay. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 4940–4950. PMLR, 2021b. URL http://proceedings.mlr.press/v139/jiang21b.html.
- Grounding aleatoric uncertainty for unsupervised environment design. Advances in Neural Information Processing Systems, 35:32868–32881, 2022.
- minimax: Efficient baselines for autocurricula in jax. In Agent Learning in Open-Endedness Workshop at NeurIPS, 2023.
- The malmo platform for artificial intelligence experimentation. In Ijcai, pp. 4246–4247, 2016.
- Pgx: Hardware-accelerated parallel game simulators for reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
- The nethack learning environment. Advances in Neural Information Processing Systems, 33:7671–7684, 2020.
- Lange, R. T. gymnax: A JAX-based reinforcement learning environment library, 2022. URL http://github.com/RobertTLange/gymnax.
- Exploiting open-endedness to solve problems through the search for novelty. Artificial Life - ALIFE, 01 2008.
- Discovered policy optimisation. Advances in Neural Information Processing Systems, 35:16455–16468, 2022.
- Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research, 61:523–562, 2018.
- XLand-minigrid: Scalable meta-reinforcement learning environments in JAX. In Intrinsically-Motivated and Open-Ended Learning Workshop, NeurIPS2023, 2023. URL https://openreview.net/forum?id=xALDC4aHGz.
- Open Ended Learning Team et al. Open-ended learning leads to generally capable agents, 2021. URL https://arxiv.org/abs/2107.12808.
- Behaviour suite for reinforcement learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rygf-kSYwH.
- Intrinsic motivation systems for autonomous mental development. IEEE transactions on evolutionary computation, 11(2):265–286, 2007.
- Evolving curricula with regret-based environment design. In Proceedings of the International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 17473–17498. PMLR, 2022. URL https://proceedings.mlr.press/v162/parker-holder22a.html.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
- Jaxmarl: Multi-agent rl environments in jax. arXiv preprint arXiv:2311.10090, 2023.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
- Minihack the planet: A sandbox for open-ended reinforcement learning research, 2021. URL https://arxiv.org/abs/2109.13202.
- Schmidhuber, J. A possibility for implementing curiosity and boredom in model-building neural controllers. In Proc. of the international conference on simulation of adaptive behavior: From animals to animats, pp. 222–227, 1991.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Open-endedness: The last grand challenge you’ve never heard of. While open-endedness could be a force for discovering intelligence, it could also be a component of AI itself, 2017.
- Intrinsic motivation and automatic curricula via asymmetric self-play. In 6th International Conference on Learning Representations, ICLR 2018, 2018. URL https://openreview.net/forum?id=SkT5Yg-RZ.
- Domain randomization for transferring deep neural networks from simulation to the real world. In International Conference on Intelligent Robots and Systems, pp. 23–30. IEEE, 2017. doi: 10.1109/IROS.2017.8202133. URL https://doi.org/10.1109/IROS.2017.8202133.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033. IEEE, 2012.
- Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753, 2019.
- How does learning rate decay help modern neural networks? arXiv preprint arXiv:1908.01878, 2019.
- Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments. arXiv preprint arXiv:1903.03176, 2019.
- Michael Matthews (6 papers)
- Michael Beukman (19 papers)
- Benjamin Ellis (12 papers)
- Mikayel Samvelyan (22 papers)
- Matthew Jackson (6 papers)
- Samuel Coward (20 papers)
- Jakob Foerster (101 papers)