Minigrid and Miniworld Libraries

Updated 4 October 2025

Minigrid and Miniworld are open-source libraries that provide minimalistic, modular reinforcement learning environments for goal-oriented tasks.
They enable rapid prototyping and transfer of RL policies across 2D and 3D settings, supporting applications like curriculum learning and meta-learning.
JAX-based extensions such as NAVIX and XLand-MiniGrid boost scalability by leveraging hardware acceleration to achieve millions of simulation steps per second.

Minigrid and Miniworld are widely adopted open-source libraries that provide modular and customizable reinforcement learning (RL) environments for goal-oriented tasks. Originating from a minimalistic design philosophy, these Python libraries enable researchers to design, prototype, and analyze RL algorithms efficiently. Both serve as reference environments for evaluating RL agents and facilitate research in diverse areas including curriculum learning, language-conditioned policies, curriculum transfer, exploration, and meta-learning (Chevalier-Boisvert et al., 2023). The emergence of JAX-accelerated frameworks such as XLand-MiniGrid (Nikulin et al., 2023) and NAVIX (Pignatelli et al., 28 Jul 2024) has further extended the scalability and throughput of experiments, enabling large-scale RL research previously constrained by CPU-bound simulation bottlenecks.

1. Minimalistic and Modular Design Principles

Both Minigrid and Miniworld were conceived under a minimalistic design paradigm to minimize code complexity and external dependencies, thereby supporting rapid prototyping and environment extension by researchers:

Implementation: Both are implemented in Python and build on the OpenAI Gym (now Gymnasium) API, enabling compatibility with standard RL toolsets.
Dependencies: Minigrid relies primarily on NumPy for its GridWorld backend; Miniworld depends on Pyglet for 2.5D/3D rendering.
Transparency and Extensibility: Codebases are intentionally kept simple, making it straightforward for users to understand and customize environments to their experimental requirements (Chevalier-Boisvert et al., 2023).

The design approach stands in contrast to more complex simulated environments with extensive asset or physics dependencies. This fosters open experimentation, facilitating integration with algorithm libraries such as Stable-Baselines3.

2. Environment Architecture and API Structure

Minigrid

Environment Class: Provides a collection of 2D GridWorld environments, each represented by an $n \times m$ grid of tiles. Tiles can include empty spaces, walls, keys, goals, and other entity types.
POMDP Structure: Each environment is formulated as a partially observable Markov decision process (POMDP). The agent receives observations as a dictionary containing:
- An "image" (top-down view, possibly with limited range),
- "direction" (integer encoding the agent's facing),
- "mission" (textual instruction, e.g., “go to the {color} {object}”).

Miniworld

3D World Model: Consists of connected rooms and various objects within a 2.5D discrete spatial structure (flat floorplan for computational simplicity).
Observation and Action: The agent's default observation is an $80 \times 60$ RGB image; the discrete action space typically has eight actions (including movements such as forward, turning, and "move back") (Chevalier-Boisvert et al., 2023).

Unified API

A shared API design abstracts away differences between 2D (Minigrid) and 3D (Miniworld) representations, enabling seamless transfer of code, learning setups, and policy architectures. Environment generation for both follows a streamlined pattern:

Minigrid: _gen_grid() constructs the grid structure, places entities, and assigns a randomized starting agent position.
Miniworld: _gen_world() defines rooms, object placement, and agent initialization.

The unified design allows direct parameter transfer (such as actor/critic weights) between environments, a feature leveraged in transfer learning experiments.

3. Advances in Scalability: JAX-based Reimplementations

Recent advancements have addressed traditional computational bottlenecks associated with CPU-bound, Python-based simulation:

NAVIX

Architectural Overview: NAVIX is a JAX-based reimplementation of Minigrid that retains full compatibility with the original POMDP logic and environment dynamics. The environment tuple is $\mathcal{M} = (h, w, T, \mathcal{O}, \mathcal{A}, \mathcal{R}, d, O, R, \gamma, P)$ , with all transitions occurring in JAX.
Entity-Component-System Model (ECSM): Entities (e.g., agent, keys, doors) are decomposed into components (Position, Direction, Colour, etc.), processed by distinct systems (Transition, Observation, Reward).
Stateful Batch Simulation: Uses a jittable "timestep" representation $(t, o_t, a_t, r_{t+1}, \gamma_{t+1}, s_t, i_{t+1})$ , permitting vectorization and rolling out of multiple environments in parallel via JAX primitives (vmap, lax.scan).
Throughput: On an Nvidia A100 80GB, NAVIX attains $\sim 670$ million steps per second using 2048 parallel agents, a $>200\,000 \times$ speedup over traditional MiniGrid (which achieves $\sim 3\,144$ steps/s). The scaling formula is

$S = \frac{N_{\text{agents}} \times \text{Steps}_{\text{per agent}}}{\text{Time}_{\text{taken}}}$

(Pignatelli et al., 28 Jul 2024).

Autoreset and Markovian Rewards: Implements autoreset via embedding reset logic in the main JAX step function; uses a Markovian reward ($0$ for all steps, $1$ at task completion).

XLand-MiniGrid

Meta-Learning Focus: Leverages a flexible rule/goals system for procedurally composing tasks as "task trees" with arbitrary compositional depth and dynamic rules, facilitating meta-RL experiments.
Stateless JAX API: Employs stateless, jit-compatible reset and step methods with full hardware acceleration (jax.vmap/pmap).
Scalability: Achieves millions of steps per second, scaling nearly linearly with numbers of environments/devices.
Benchmarks: Provides millions of unique, pre-sampled task specifications (rulesets), with four preset difficulty levels (trivial, small, medium, high).
Mathematical Structure: TimeStep includes a discount $\gamma \in [0,1]$ indicating terminality; PPO loss uses the standard clipped policy objective:

$L_{\text{clip}}(\theta) = \mathbb{E}_{t}\left[ \min\left( r_t(\theta) \hat{A}_t,\, \text{clip}(r_t(\theta), 1 - \epsilon, 1 + \epsilon) \hat{A}_t \right) \right]$

with $r_t(\theta) = \frac{\pi_{\theta}(a_t | s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)}$ (Nikulin et al., 2023).

4. Case Studies: Transfer Learning and Human Adaptation

The unified API of Minigrid and Miniworld supports both agent and human transfer learning experiments:

RL Policy Transfer: PPO agents are trained in a Minigrid environment and weights transferred to corresponding Miniworld agents. Performance is measured by the area under the reward curve (AUC) with transfer effectiveness defined as

$\text{Transfer\_Improvement} = \frac{\text{Transfer\_Learning\_AUC} - \text{Miniworld\_Learning\_AUC}}{\text{Miniworld\_Learning\_AUC}}$

Optimal transfer was observed when only critic networks and mission embeddings were transferred without freezing their weights (Chevalier-Boisvert et al., 2023).

Human Studies: Experimental results with ten human subjects indicate that exposure to a Minigrid scenario before switching to Miniworld improved adaptation metrics, such as reward trajectory, even under randomized control schemes.

These results highlight the utility of the libraries for probing cross-modal transfer and the effects of prior experience in geometric and observation transformations.

5. Community Impact and Research Ecosystem

Minigrid and Miniworld have become standard environments in the RL community, supporting research in areas such as:

Safe RL
Curiosity-driven exploration
Meta-learning
Language-conditioned RL (e.g., BabyAI builds directly on Minigrid)
Credit assignment and policy generalization (Chevalier-Boisvert et al., 2023, Pignatelli et al., 28 Jul 2024)

Project popularity is reflected in over 2400 GitHub stars and 620 forks for Minigrid. Open-source code and documentation are actively maintained by the Farama Foundation, with ongoing updates to support evolving research needs.

The JAX-based successors (NAVIX, XLand-MiniGrid) have democratized rapid experimentation by removing throughput bottlenecks, enabling single-GPU research to reach millions of steps per second. This accelerates experimental cycles, reduces training wall-clock from weeks to minutes, and lowers the bar for prototyping advanced RL ideas (Pignatelli et al., 28 Jul 2024, Nikulin et al., 2023).

6. Comparative Summary

Library	Core API	Rendering	Batch/Parallelization	Main Research Domain
Minigrid	Python Gym	2D, NumPy	Classic, CPU Python	Curriculum/meta learning, RL
Miniworld	Python Gym	3D, Pyglet	Classic, CPU Python	Goal-oriented spatial RL
XLand-MiniGrid	JAX	N/A	GPU/TPU, Millions of steps/s	Meta-RL, benchmarks, scaling
NAVIX	JAX	N/A	GPU, 2048 parallel agents	Batched scalable RL

Minigrid/Miniworld prioritize interpretability and extensibility in classic RL setups, while NAVIX and XLand-MiniGrid address the need for scalable, accelerator-native simulation.

7. Source Code Availability and Licensing

Minigrid GitHub: https://github.com/Farama-Foundation/Minigrid
Miniworld GitHub: https://github.com/Farama-Foundation/Miniworld
XLand-MiniGrid: https://github.com/dunnolab/xland-minigrid
NAVIX: URL specified as available in paper

All libraries are released under the Apache-2.0 license with extensive documentation and tutorials available at https://minigrid.farama.org/ and https://miniworld.farama.org/, ensuring accessibility and ongoing community extension.

PDF Markdown Chat (Pro)

References (3)

Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks (2023)

XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX (2023)

NAVIX: Scaling MiniGrid Environments with JAX (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Minigrid and Miniworld Libraries.