Crafter Environment Testbed

Updated 12 November 2025

Crafter Environment is a suite of 2D grid-based reinforcement learning testbeds offering challenges such as deep exploration, sparse rewards, and structured achievement learning.
It features multiple variants—including canonical, object-oriented, optimized, and Dojo versions—that provide diverse state representations and action spaces for robust agent evaluation.
The environment supports rigorous evaluation protocols with achievement-based scoring and OOD generalization tests, driving advances in autonomous skill and foundation model research.

Crafter Environment is a family of 2D, grid-based open-world reinforcement learning testbeds designed to probe deep exploration, sparse-reward learning, generalization, and symbolic or pixel-based world modeling. It has become a reference environment for research targeting long-horizon autonomous skill acquisition, achievement-based reward structures, model-based RL, and foundation models for embodied agents. Variants include the canonical Crafter, symbolic extensions (Crafter-OO), optimized reimplementations (Craftax), and specialized benchmark toolkits (CrafterDojo). This entry provides a rigorous account of the environment’s structure, semantics, and its role in contemporary agent research.

1. State, Action, and Observation Specification

Crafter is formalized as a finite- or infinite-horizon partially observable MDP $(S, A, T, R, \gamma)$ , where:

State Space $S$ :
- In the canonical version, $S$ comprises a $64 \times 64$ grid representing terrain, objects, creatures (player, zombies, skeletons, cows), day-night cycles, inventory, and player internal status (health, food, water, sleep/energy) (Hafner, 2021, Stanić et al., 2022).
- In the object-oriented extension Crafter-OO, $S$ is a Pydantic WorldState with hierarchical fields:
- player (position, facing direction, inventory, achievements, survival stats),
- objects (list of dynamic entities each with type-specific attributes),
- materials (2D array of terrain tokens),
- Additional internals (chunks for spatial partitioning, daylight, RNG state) (Khan et al., 14 Oct 2025).
Action Space $A$ :
- Discrete; 17 actions in the canonical setting: 4 moves, sleep, 12 interact/craft/mine/place primitives (Hafner, 2021, Stanić et al., 2022).
- In Crafter-OO, named primitives include “Move North/East/South/West,” “Do” (general interaction), “Make X,” “Place Y,” “Sleep” (Khan et al., 14 Oct 2025).
- Extended in Craftax to up to 43 actions to cover richer mechanics (descend, use potions, cast spells) (Matthews et al., 26 Feb 2024).
Observation Space $O(s)$ :
- Raw RGB images ( $64 \times 64 \times 3$ ), displaying an egocentric window, nearby entities, and a HUD overlay encoding inventory and survival bars (Hafner, 2021, Stanić et al., 2022).
- Symbolic variants expose a $7 \times 9$ or $9 \times 11$ grid window and a flat vector of inventory, status, and context (length 1345–8268) (Matthews et al., 26 Feb 2024).
- In object-centric architectures, CNNs parse observations into local patches for slot or attention-based policy components (Stanić et al., 2022).

2. Dynamics, Stochasticity, and Transition Function

Transition Function $T(s,a)$
- Purely deterministic in the original Crafter (conditional on RNG seed) (Hafner, 2021), but supports stochasticity via random enemy spawns, attack outcomes, and resource drops in both pixel-based and symbolic variants (Khan et al., 14 Oct 2025, Burchi et al., 5 Jul 2025).
- In Crafter-OO, $T: S \times A \to \Delta(S)$ is a pure (functional) operator:
- 1. Construct imperative simulator from $s_t$ , including serialized RNG,
- 2. Apply $a_t$ , advance world one tick,
- 3. Convert next simulator state to a new declarative $s_{t+1}$ (Khan et al., 14 Oct 2025).
Stochasticity is encoded in the random state captured at each step, ensuring transitions are exactly reproducible or appropriately sampled.
Environment resets are performed either routinely (on death, episode length exceeded), or with “optimistic resets” in Craftax to maximize throughput (Matthews et al., 26 Feb 2024).

3. Achievement-Based Sparse Reward and Scoring

Reward Structure
- The principal extrinsic reward is achievement-based: each of $N=22$ achievements (collect, craft, place, combat, survive) yields $+1$ only on its first unlock per episode (Hafner, 2021, Stanić et al., 2022).
- Survival shaping: $\pm 0.1$ per health point gained/lost at each step.
- Extended versions scale up to 65 weighted achievements in Craftax (Matthews et al., 26 Feb 2024).
- Reward function (canonical):
$r(s_t,a_t) = \sum_{i=1}^{22} \mathbb{I}_i(t) + 0.1 \Delta H_t$

where $\mathbb{I}_i(t)$ counts first unlock of achievement $i$ at $t$ , $\Delta H_t$ is health change (Hafner, 2021).
Aggregate Score
- Per-episode performance is measured by number of distinct achievements unlocked.
- Cross-episode report:
$\mathrm{Score} = \exp\left(\frac{1}{N} \sum_{i=1}^N \ln(1+s_i)\right) - 1$

with $s_i$ the per-achievement success rate (percent of episodes in which $i$ is unlocked) (Hafner, 2021, Stanić et al., 2022). - The geometric mean ensures rare (deep) achievements strongly affect overall score.

4. Environment Variants, Extensions, and Optimization

Crafter-OO exposes symbolic, object-oriented state and actions with a pure functional transition, supporting programmatic law induction and world-modeling for compositional RL and planning (Khan et al., 14 Oct 2025).
Craftax-Classic/Craftax ("JAX Crafter"): A JAX port enabling $>250\times$ throughput, symbolically exposes grids and inventory, and extends task complexity with new achievements, nine procedural floors, NetHack-inspired mechanics, and weighted reward (Matthews et al., 26 Feb 2024).
CrafterDojo: A foundation model/behavioral benchmark suite providing expert policies, contrastive video-text data, vision-language embedding models (CrafterVPT, CrafterCLIP), and instruction-following agents (CrafterSteve-1). This toolkit supports rapid prototyping and benchmarking for foundation models under the Crafter environment (Park et al., 19 Aug 2025).
CrafterOOD: Out-of-distribution variants modifying object appearance (texture/color variants) and object count, to rigorously probe generalization of policies and inductive biases (Stanić et al., 2022).

5. Evaluation Protocols and Core Benchmarks

Training/Evaluation Budgets: Standard protocol allocates $1$ million environment steps for RL agents (Hafner, 2021), with ablations extending to $10$+ million or up to a billion frames in Craftax (Matthews et al., 26 Feb 2024, Park et al., 19 Aug 2025).
Metrics:
- Per-achievement unlock rate $s_i$ .
- Aggregate geometric-mean “Crafter score.”
- In Crafter-OO, additional metrics include:
- State Ranking: Given set $C$ of next-state candidates (one ground-truth $s’$ plus $K$ distractors via illegal mutations), compute
$\text{Rank@1} = \frac{1}{N} \sum_{i=1}^N \mathbf{1}[r_i = 1], \quad \text{MRR} = \frac{1}{N} \sum_{i=1}^N \frac{1}{r_i}$

where $r_i$ is the rank of $s’$ (Khan et al., 14 Oct 2025). - State Fidelity: Edit distance between predicted and real next-state, normalized by state size:

$\text{Normalized ED} = \frac{\text{ED}(s', \hat{s}')}{| \text{elements}(s') |}$ - Empirical Protocols: Scripted scenario sweeps test all mechanics, including failure cases.
Asymptotic and OOD Generalization: Experiments test agent performance as object distributions shift, or as training is extended far beyond initial performance convergence (Stanić et al., 2022).

Environment	State Rep.	Achiev.	Score Metric	Notable Extension	Reference
Crafter	$64 \!\times\! 64 \times 3$	22	Geometric mean	—	(Hafner, 2021)
Crafter-OO	Symbolic OO	23	Rank@1, ED	Programmatic world model	(Khan et al., 14 Oct 2025)
Craftax-Classic	Symbolic/pixel	22	% unlocked	250 $\times$ speed	(Matthews et al., 26 Feb 2024)
CrafterDojo	Both	22	Foundation model	Behavioral priors, VLM	(Park et al., 19 Aug 2025)
CrafterOOD	Pixel	22	OOD success rates	Generalization splits	(Stanić et al., 2022)

6. Research Impact and Key Empirical Findings

The Crafter family has become a canonical testbed for:

Sparse reward and deep exploration: The hierarchical achievement graph—with bottlenecks (e.g., crafting rare tools, mining diamonds)—exposes exploration challenges unsolved by simple curiosity or count-based methods (Hafner, 2021, Stanić et al., 2022, Ferrao et al., 26 Mar 2025).
Structured skill learning: Methods such as Structured Exploration with Achievements (SEA) recover the achievement dependency DAG directly from offline data and exploit this structure for hierarchical exploration, yielding nontrivial unlock rates on milestones unreachable by PPO, RND, or DreamerV2 (e.g., 49% mean unlock of hard-set achievements vs 0% for IMPALA baselines) (Zhou et al., 2023).
World modeling: Advanced world-model architectures (e.g., DreamerV3, $\Delta$ -IRIS, EMERALD) have set successive SOTA on the geometric mean score, with spatially-structured MaskGIT-transformer models (EMERALD) attaining 58.1%—the first to outpace human experts within 10M steps (Burchi et al., 5 Jul 2025).
Foundation models and vision-language integration: CrafterDojo’s foundation suite demonstrates that behavior priors (CrafterVPT), contrastive video-text encoders (CrafterCLIP), and instruction-following heads (CrafterSteve-1) enable rapid agent development, transfer, and grounded, explainable control (Park et al., 19 Aug 2025).
OOD generalization: CrafterOOD exposes the limitations of vanilla PPO and demonstrates robust zero-shot adaptation is only possible with object-centric agents and explicit attention inductive biases (Stanić et al., 2022).
Symbolic world model induction: Crafter-OO enables compositional, programmatic law extraction and probabilistic modeling, supporting conditional inference, dynamic computation graphs, and planning in stochastic settings (OneLife) (Khan et al., 14 Oct 2025).
Efficiency and Reproducibility: Craftax permits billion-step-scale experiments in under an hour on a single GPU and provides standardized logging, seeding, and data APIs (Matthews et al., 26 Feb 2024).

7. Open Problems, Limitations, and Future Directions

Key research challenges and future avenues with Crafter:

Extending achievement graphs: Automated discovery of new achievement types, dynamic goal insertion, and hierarchical skill composition remain underdeveloped, particularly in variants like Craftax with 65+ achievements (Matthews et al., 26 Feb 2024).
Language grounding and generalization: Integrating language instructions, text-based hints (e.g., potion effects), and meta-learning structures is ongoing (Park et al., 19 Aug 2025).
Scaling symbolic modeling: The synthesis of scalable, interpretable, and performant symbolic models from unguided exploration is not yet robust under heavy stochasticity or densely coupled rules (Khan et al., 14 Oct 2025).
Memory and planning: Most benchmarks show clear gains moving from feedforward to recurrent/attention models, but efficient architectures for long-horizon memory in open worlds remain open.
Intrinsic motivation: Count-, change-, and curiosity-based bonuses (e.g., CBET) can augment sparse extrinsic reward in Crafter, but risk misalignment if proxies deviate from the achievement structure (Ferrao et al., 26 Mar 2025).
Open-endedness and unsolved frontiers: Extended Craftax variants remain unsolved by PPO, PPO-RNN, and intrinsic/UED baselines beyond basic/intermediate achievement categories (even after $10^{10}$ steps), highlighting a persistent exploration gap (Matthews et al., 26 Feb 2024).
Benchmark limitations: The 2D top-down nature and reliance on inventory overlays can obscure memory challenges, and stock Crafter may underprobe certain aspects of perception or conative flexibility (Stanić et al., 2022).

Crafter and its symbolic, optimized, and generalization-centric descendants set the standard for evaluating exploration, long-horizon planning, structured achievement learning, and foundation-model-based control in embodied RL. Their open-source reference implementations, composable API structures, and comprehensive benchmark regimes have substantially guided research in model-based RL, symbolic induction, and embodied foundation models.