Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 81 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Minigrid and Miniworld Libraries

Updated 4 October 2025
  • Minigrid and Miniworld are open-source libraries that provide minimalistic, modular reinforcement learning environments for goal-oriented tasks.
  • They enable rapid prototyping and transfer of RL policies across 2D and 3D settings, supporting applications like curriculum learning and meta-learning.
  • JAX-based extensions such as NAVIX and XLand-MiniGrid boost scalability by leveraging hardware acceleration to achieve millions of simulation steps per second.

Minigrid and Miniworld are widely adopted open-source libraries that provide modular and customizable reinforcement learning (RL) environments for goal-oriented tasks. Originating from a minimalistic design philosophy, these Python libraries enable researchers to design, prototype, and analyze RL algorithms efficiently. Both serve as reference environments for evaluating RL agents and facilitate research in diverse areas including curriculum learning, language-conditioned policies, curriculum transfer, exploration, and meta-learning (Chevalier-Boisvert et al., 2023). The emergence of JAX-accelerated frameworks such as XLand-MiniGrid (&&&1&&&) and NAVIX (Pignatelli et al., 28 Jul 2024) has further extended the scalability and throughput of experiments, enabling large-scale RL research previously constrained by CPU-bound simulation bottlenecks.

1. Minimalistic and Modular Design Principles

Both Minigrid and Miniworld were conceived under a minimalistic design paradigm to minimize code complexity and external dependencies, thereby supporting rapid prototyping and environment extension by researchers:

  • Implementation: Both are implemented in Python and build on the OpenAI Gym (now Gymnasium) API, enabling compatibility with standard RL toolsets.
  • Dependencies: Minigrid relies primarily on NumPy for its GridWorld backend; Miniworld depends on Pyglet for 2.5D/3D rendering.
  • Transparency and Extensibility: Codebases are intentionally kept simple, making it straightforward for users to understand and customize environments to their experimental requirements (Chevalier-Boisvert et al., 2023).

The design approach stands in contrast to more complex simulated environments with extensive asset or physics dependencies. This fosters open experimentation, facilitating integration with algorithm libraries such as Stable-Baselines3.

2. Environment Architecture and API Structure

Minigrid

  • Environment Class: Provides a collection of 2D GridWorld environments, each represented by an n×mn \times m grid of tiles. Tiles can include empty spaces, walls, keys, goals, and other entity types.
  • POMDP Structure: Each environment is formulated as a partially observable Markov decision process (POMDP). The agent receives observations as a dictionary containing:
    • An "image" (top-down view, possibly with limited range),
    • "direction" (integer encoding the agent's facing),
    • "mission" (textual instruction, e.g., “go to the {color} {object}”).

Miniworld

  • 3D World Model: Consists of connected rooms and various objects within a 2.5D discrete spatial structure (flat floorplan for computational simplicity).
  • Observation and Action: The agent's default observation is an 80×6080 \times 60 RGB image; the discrete action space typically has eight actions (including movements such as forward, turning, and "move back") (Chevalier-Boisvert et al., 2023).

Unified API

A shared API design abstracts away differences between 2D (Minigrid) and 3D (Miniworld) representations, enabling seamless transfer of code, learning setups, and policy architectures. Environment generation for both follows a streamlined pattern:

  • Minigrid: _gen_grid() constructs the grid structure, places entities, and assigns a randomized starting agent position.
  • Miniworld: _gen_world() defines rooms, object placement, and agent initialization.

The unified design allows direct parameter transfer (such as actor/critic weights) between environments, a feature leveraged in transfer learning experiments.

3. Advances in Scalability: JAX-based Reimplementations

Recent advancements have addressed traditional computational bottlenecks associated with CPU-bound, Python-based simulation:

  • Architectural Overview: NAVIX is a JAX-based reimplementation of Minigrid that retains full compatibility with the original POMDP logic and environment dynamics. The environment tuple is M=(h,w,T,O,A,R,d,O,R,γ,P)\mathcal{M} = (h, w, T, \mathcal{O}, \mathcal{A}, \mathcal{R}, d, O, R, \gamma, P), with all transitions occurring in JAX.
  • Entity-Component-System Model (ECSM): Entities (e.g., agent, keys, doors) are decomposed into components (Position, Direction, Colour, etc.), processed by distinct systems (Transition, Observation, Reward).
  • Stateful Batch Simulation: Uses a jittable "timestep" representation (t,ot,at,rt+1,γt+1,st,it+1)(t, o_t, a_t, r_{t+1}, \gamma_{t+1}, s_t, i_{t+1}), permitting vectorization and rolling out of multiple environments in parallel via JAX primitives (vmap, lax.scan).
  • Throughput: On an Nvidia A100 80GB, NAVIX attains 670\sim 670 million steps per second using 2048 parallel agents, a >200000×>200\,000 \times speedup over traditional MiniGrid (which achieves 3144\sim 3\,144 steps/s). The scaling formula is

S=Nagents×Stepsper agentTimetakenS = \frac{N_{\text{agents}} \times \text{Steps}_{\text{per agent}}}{\text{Time}_{\text{taken}}}

(Pignatelli et al., 28 Jul 2024).

  • Autoreset and Markovian Rewards: Implements autoreset via embedding reset logic in the main JAX step function; uses a Markovian reward ($0$ for all steps, $1$ at task completion).

XLand-MiniGrid

  • Meta-Learning Focus: Leverages a flexible rule/goals system for procedurally composing tasks as "task trees" with arbitrary compositional depth and dynamic rules, facilitating meta-RL experiments.
  • Stateless JAX API: Employs stateless, jit-compatible reset and step methods with full hardware acceleration (jax.vmap/pmap).
  • Scalability: Achieves millions of steps per second, scaling nearly linearly with numbers of environments/devices.
  • Benchmarks: Provides millions of unique, pre-sampled task specifications (rulesets), with four preset difficulty levels (trivial, small, medium, high).
  • Mathematical Structure: TimeStep includes a discount γ[0,1]\gamma \in [0,1] indicating terminality; PPO loss uses the standard clipped policy objective:

Lclip(θ)=Et[min(rt(θ)A^t,clip(rt(θ),1ϵ,1+ϵ)A^t)]L_{\text{clip}}(\theta) = \mathbb{E}_{t}\left[ \min\left( r_t(\theta) \hat{A}_t,\, \text{clip}(r_t(\theta), 1 - \epsilon, 1 + \epsilon) \hat{A}_t \right) \right]

with rt(θ)=πθ(atst)πθold(atst)r_t(\theta) = \frac{\pi_{\theta}(a_t | s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)} (Nikulin et al., 2023).

4. Case Studies: Transfer Learning and Human Adaptation

The unified API of Minigrid and Miniworld supports both agent and human transfer learning experiments:

  • RL Policy Transfer: PPO agents are trained in a Minigrid environment and weights transferred to corresponding Miniworld agents. Performance is measured by the area under the reward curve (AUC) with transfer effectiveness defined as

Transfer_Improvement=Transfer_Learning_AUCMiniworld_Learning_AUCMiniworld_Learning_AUC\text{Transfer\_Improvement} = \frac{\text{Transfer\_Learning\_AUC} - \text{Miniworld\_Learning\_AUC}}{\text{Miniworld\_Learning\_AUC}}

Optimal transfer was observed when only critic networks and mission embeddings were transferred without freezing their weights (Chevalier-Boisvert et al., 2023).

  • Human Studies: Experimental results with ten human subjects indicate that exposure to a Minigrid scenario before switching to Miniworld improved adaptation metrics, such as reward trajectory, even under randomized control schemes.

These results highlight the utility of the libraries for probing cross-modal transfer and the effects of prior experience in geometric and observation transformations.

5. Community Impact and Research Ecosystem

Minigrid and Miniworld have become standard environments in the RL community, supporting research in areas such as:

Project popularity is reflected in over 2400 GitHub stars and 620 forks for Minigrid. Open-source code and documentation are actively maintained by the Farama Foundation, with ongoing updates to support evolving research needs.

The JAX-based successors (NAVIX, XLand-MiniGrid) have democratized rapid experimentation by removing throughput bottlenecks, enabling single-GPU research to reach millions of steps per second. This accelerates experimental cycles, reduces training wall-clock from weeks to minutes, and lowers the bar for prototyping advanced RL ideas (Pignatelli et al., 28 Jul 2024, Nikulin et al., 2023).

6. Comparative Summary

Library Core API Rendering Batch/Parallelization Main Research Domain
Minigrid Python Gym 2D, NumPy Classic, CPU Python Curriculum/meta learning, RL
Miniworld Python Gym 3D, Pyglet Classic, CPU Python Goal-oriented spatial RL
XLand-MiniGrid JAX N/A GPU/TPU, Millions of steps/s Meta-RL, benchmarks, scaling
NAVIX JAX N/A GPU, 2048 parallel agents Batched scalable RL

Minigrid/Miniworld prioritize interpretability and extensibility in classic RL setups, while NAVIX and XLand-MiniGrid address the need for scalable, accelerator-native simulation.

7. Source Code Availability and Licensing

All libraries are released under the Apache-2.0 license with extensive documentation and tutorials available at https://minigrid.farama.org/ and https://miniworld.farama.org/, ensuring accessibility and ongoing community extension.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Minigrid and Miniworld Libraries.