Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX (2306.09884v2)

Published 16 Jun 2023 in cs.LG and cs.AI

Abstract: Open-source reinforcement learning (RL) environments have played a crucial role in driving progress in the development of AI algorithms. In modern RL research, there is a need for simulated environments that are performant, scalable, and modular to enable their utilization in a wider range of potential real-world applications. Therefore, we present Jumanji, a suite of diverse RL environments specifically designed to be fast, flexible, and scalable. Jumanji provides a suite of environments focusing on combinatorial problems frequently encountered in industry, as well as challenging general decision-making tasks. By leveraging the efficiency of JAX and hardware accelerators like GPUs and TPUs, Jumanji enables rapid iteration of research ideas and large-scale experimentation, ultimately empowering more capable agents. Unlike existing RL environment suites, Jumanji is highly customizable, allowing users to tailor the initial state distribution and problem complexity to their needs. Furthermore, we provide actor-critic baselines for each environment, accompanied by preliminary findings on scaling and generalization scenarios. Jumanji aims to set a new standard for speed, adaptability, and scalability of RL environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. The DeepMind JAX Ecosystem, 2020. URL http://github.com/deepmind.
  2. ORL: Reinforcement Learning Benchmarks for Online Stochastic Optimization Problems, 2019.
  3. DeepMind Lab, 2016.
  4. The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research, 47:253–279, jun 2013.
  5. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290(2):405–421, 2021.
  6. Evolving diverse tsp instances by means of novel and creative mutation operators. In Proceedings of the 15th ACM/SIGEVO Conference on Foundations of Genetic Algorithms, pages 58–71, 2019.
  7. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  8. OpenAI Gym, 2016.
  9. Leveraging Procedural Generation to Benchmark Reinforcement Learning. arXiv preprint arXiv:1912.01588, 2019.
  10. S. Dalton and i. frosio. Accelerating Reinforcement Learning through GPU Atari Emulation. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 19773–19782. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/e4d78a6b4d93e1d79241f7b282fa3413-Paper.pdf.
  11. GPU-Accelerated Atari Emulation for Reinforcement Learning, 2019.
  12. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  13. Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation, 2021. URL http://github.com/google/brax.
  14. Google. OR-Tools - Google Optimization Tools. https://github.com/google/or-tools, 2023. Google’s software suite for combinatorial optimization.
  15. Podracer architectures for scalable reinforcement learning. arXiv preprint arXiv:2104.06272, 2021.
  16. OR-Gym: A Reinforcement Learning Library for Operations Research Problems, 2020.
  17. Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627, 2020.
  18. Attention, learn to solve routing problems!, 2018.
  19. Pgx: Hardware-accelerated parallel game simulation for reinforcement learning. arXiv preprint arXiv:2303.17503, 2023.
  20. ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
  21. OpenSpiel: A Framework for Reinforcement Learning in Games, 2020.
  22. R. T. Lange. gymnax: A JAX-based Reinforcement Learning Environment Library, 2022. URL http://github.com/RobertTLange/gymnax.
  23. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning, 2021.
  24. Asynchronous Methods for Deep Reinforcement Learning, 2016.
  25. dm_env: A Python interface for reinforcement learning environments, 2019. URL http://github.com/deepmind/dm_env.
  26. Behaviour suite for reinforcement learning, 2020.
  27. M. L. Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 1994.
  28. Investigating multi-task pretraining and generalization in reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=sSt9fROSZRO.
  29. Open-Ended Learning Leads to Generally Capable Agents, 2021.
  30. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012. doi: 10.1109/IROS.2012.6386109.
  31. Attention is All you Need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
Citations (22)

Summary

  • The paper introduces Jumanji as a suite of 22 scalable and customizable RL environments in JAX that enable efficient experiments on complex real-world challenges.
  • It leverages JAX’s automatic differentiation and hardware acceleration (GPUs/TPUs) to execute large-scale RL experiments on NP-hard combinatorial optimization tasks.
  • Empirical results demonstrate substantial speed improvements and robust generalization, driving advancements in applying RL to industrial problems such as logistics.

Jumanji: A Suite of Scalable Reinforcement Learning Environments

The paper introduces Jumanji, a comprehensive suite of Reinforcement Learning (RL) environments developed in JAX. Jumanji aims to address the current limitations of open-source RL environments by offering a highly customizable and scalable set of environments. These environments are designed to be performant, flexible, and adaptable, supporting potential real-world applications.

Core Contributions

  1. Scalability and Flexibility: The paper emphasizes the scalability of Jumanji environments, enabled by their implementation in JAX. JAX facilitates automatic differentiation and efficient computation, leveraging hardware accelerators such as GPUs and TPUs. This allows researchers to execute large-scale experiments and iterate rapidly.
  2. Diverse Environment Suite: Jumanji consists of 22 environments categorized into routing, packing, and logic, focusing on NP-hard combinatorial optimization problems (COPs). These are aligned with industry-centric real-world problems, providing a strong foundation for RL research.
  3. Customizability: Unlike existing libraries, Jumanji offers a high degree of customization, allowing users to modify initial state distributions and problem complexity easily. This adaptability enables researchers to tailor environments according to specific research goals.
  4. Baseline and Performance: The authors provide actor-critic baselines for each environment, highlighting the environments' scalability and flexibility. Through empirical studies, these baselines demonstrate how the environments perform under varying conditions of complexity and distribution.

Experimental Insights

The experiments showcase Jumanji’s capabilities in scaling with hardware resources and adjusting difficulty levels. The paper reports substantial speed improvements by deploying the environments on TPUs and GPUs. Through parallelization experiments, they demonstrate efficient environment scaling, achieving substantial throughput on different hardware configurations.

The paper also explores the impact of training on diverse data distributions, showing that broad initial state distributions can enhance the robustness of RL models. The flexibility in configuring the environments allows for crucial evaluations of an agent's generalization capabilities in more realistic scenarios.

Theoretical and Practical Implications

The introduction of Jumanji holds significant implications for both the theoretical and practical advancements in RL:

  • Theoretical Perspective: By providing environments that reflect complex real-world challenges, Jumanji encourages further exploration into scalable and generalizable RL solutions. This can lead to the development of algorithms that are more robust and capable of handling industry-relevant tasks.
  • Practical Applications: Jumanji sets a new benchmark for RL environments, assisting researchers in bridging the gap between academic research and industrial applications. Its focus on real-world COPs ensures that solutions developed using Jumanji can be relevant and directly applicable to industry problems, such as logistics and resource management.

Future Directions

The authors acknowledge the potential for expanding Jumanji to include multi-agent scenarios and continuous action spaces. Future work will likely explore these avenues, potentially leading to the inclusion of environments that cater to a wider array of industrial domains, such as healthcare, agriculture, and more.

In conclusion, Jumanji represents a significant advancement in the domain of RL research tools. By prioritizing scalability, flexibility, and real-world applicability, Jumanji provides a robust platform for developing and testing RL algorithms against complex, industry-inspired challenges.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com