Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks (2306.13831v1)

Published 24 Jun 2023 in cs.LG

Abstract: We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas. In this paper, we outline the design philosophy, environment details, and their world generation API. We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces. The source code of Minigrid and Miniworld can be found at https://github.com/Farama-Foundation/{Minigrid, Miniworld} along with their documentation at https://{minigrid, miniworld}.farama.org/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. C. Bamford. Griddly: A platform for AI research in games. Software Impacts, 8:100066, 2021.
  2. DeepMind Lab. CoRR, abs/1612.03801, 2016.
  3. OpenAI Gym. CoRR, abs/1606.01540, 2016.
  4. BabyAI: A platform to study the sample efficiency of grounded language learning. In Proceedings of International Conference on Learning Representations, New Orleans, LA, May 2019.
  5. Emergent complexity and zero-shot transfer via unsupervised environment design. In Proceedings of Advances in Neural Information Processing Systems 33, Virtual, December 2020.
  6. Relay Policy Learning: Solving long-horizon tasks via imitation and reinforcement learning. In Proceedings of the Conference on Robot Learning, Virtual, pages 1025–1037, October 2020.
  7. R. L. Gutierrez and M. Leonetti. Information-theoretic task selection for meta-reinforcement learning. In Proceedings of Advances in Neural Information Processing Systems 33, Virtual, December 2020.
  8. Pre-trained word embeddings for goal-conditional transfer learning in reinforcement learning. CoRR, abs/2007.05196, 2020.
  9. Generalization in reinforcement learning with selective noise injection and information bottleneck. In Proceedings of Advances in Neural Information Processing Systems 32, Vancouver, Canada, pages 13956–13968, December 2019.
  10. Partially observable markov decision processes for artificial intelligence. In I. Wachsmuth, C.-R. Rollinger, and W. Brauer, editors, KI-95: Advances in Artificial Intelligence, pages 1–17, Berlin, Heidelberg, 1995. Springer Berlin Heidelberg. ISBN 978-3-540-44944-7.
  11. ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In Proceedings of IEEE Conference on Computational Intelligence and Games, Santorini, Greece, pages 1–8. IEEE, September 2016.
  12. Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, volume 139, pages 6925–6935. PMLR, July 2021.
  13. Isaac Gym: High performance GPU based physics simulation for robot learning. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, Virtual, December 2021.
  14. How to stay curious while avoiding noisy tvs using aleatoric uncertainty estimation. In Proceedings of International Conference on Machine Learning, Baltimore, MD, volume 162, pages 15220–15240. PMLR, July 2022.
  15. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  16. Evolving curricula with regret-based environment design. In Proceedings of International Conference on Machine Learning, Baltimore, MD, volume 162, pages 17473–17498. PMLR, July 2022.
  17. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of Advances in Neural Information Processing Systems 32, Vancouver, Canada, pages 8024–8035, December 2019.
  18. Real-world robot learning with masked visual pre-training. In Proceedings of the Conference on Robot Learning, Auckland, New Zealand, pages 416–426, 2022.
  19. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22:268:1–268:8, 2021.
  20. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
  21. State entropy maximization with random encoders for efficient exploration. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, volume 139, pages 9443–9454. PMLR, July 2021.
  22. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
  23. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. CoRR, abs/1712.01815, 2017.
  24. Mazebase: A sandbox for learning from games. CoRR, abs/1511.07401, 2015.
  25. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
  26. MuJoCo: A physics engine for model-based control. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, pages 5026–5033, October 2012.
  27. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
  28. Safe policy optimization with local generalized linear function approximations. In Proceedings of Advances in Neural Information Processing Systems 34, Virtual, pages 20759–20771, December 2021.
  29. Noveld: A simple yet effective exploration criterion. In Proceedings of Advances in Neural Information Processing Systems 34, Virtual, pages 25217–25230, December 2021.
Citations (128)

Summary

  • The paper presents modular RL environments that enable rapid creation of customizable 2D and 3D goal-oriented tasks.
  • It details a minimalistic design and seamless API integration with Python and Gym, reducing dependencies for easier adoption.
  • The work demonstrates practical utility through case studies in curriculum, exploration, and transfer learning across environments.

Overview of "Minigrid and Miniworld: Modular and Customizable Reinforcement Learning Environments for Goal-Oriented Tasks"

The paper presents the Minigrid and Miniworld libraries, which offer a set of customizable 2D and 3D environments for reinforcement learning (RL) focusing on goal-oriented tasks. Developed with a minimalistic design, these libraries aim to facilitate rapid environment creation and have achieved broad adoption within the RL community. This work outlines the design philosophy, environment details, and the APIs of these libraries, emphasizing their utility in various RL research areas.

Design Philosophy

Minigrid and Miniworld were created with simplicity and customizability as primary goals to accommodate diverse research-specific needs. Their implementation using Python and the Gym RL environment API ensures seamless integration with existing machine learning tools. The libraries have minimal dependencies (e.g., NumPy for Minigrid and Pyglet for Miniworld), which simplifies installation and reduces potential technical issues.

Environment Features

Minigrid:

  • Composed of 2D GridWorld environments with deterministic dynamics.
  • Observations include a rendered agent view, direction, and text-based mission.
  • Actions are discrete, encompassing movements and interactions with grid objects.
  • The reward function is sparse, customizable through user-defined modifications.

Miniworld:

  • Consists of 3D environments with room-object configurations.
  • Observations are RGB images from the agent’s perspective.
  • Actions include an additional "move back" action compared to Minigrid.
  • Like Minigrid, the reward structure is sparse and user-customizable.

Both libraries offer straightforward mechanisms for creating and extending environments, aided by comprehensive documentation and tutorials. They are compatible with reinforcement learning libraries such as Stable-Baselines3.

Research Adoption and Utility

Minigrid and Miniworld have been integrated extensively across various RL research domains:

  • Curriculum Learning: Used for generating scalable learning environments.
  • Exploration: Their sparse rewards make them ideal for developing robust exploration strategies.
  • Meta and Transfer Learning: Supports experimentation with new learning algorithms and transfer across diverse environments.

Case Studies

Two case studies demonstrate the libraries' capabilities in transfer learning:

  1. RL Agent Transfer Learning: Policies trained on Minigrid were transferred to similar Miniworld environments. Different strategies in transferring model components were evaluated, revealing that certain configurations enhanced learning transfer.
  2. Human Transfer Learning: Human subjects transitioned from Minigrid to Miniworld environments, showcasing adaptability in navigation tasks. This highlighted the potential for using these libraries in human-in-the-loop systems.

Implications and Future Directions

The Minigrid and Miniworld libraries provide vital infrastructure for experimenting with RL algorithms in customizable settings. Their design fosters experimentation in both theoretical and practical aspects of RL, such as safe RL, curiosity-driven exploration, and real-world applicability. Future developments include enhancing human-in-the-loop capabilities, despite the current limitations due to their simplified environment structures and Python-based implementation.

The comprehensive documentation and open-source availability on GitHub underline the libraries' accessibility and potential for further contribution by the broader research community.