Crafter-OO: Object-Oriented Benchmark

Updated 16 October 2025

Crafter-OO Environment is a procedurally generated, object-oriented benchmark that uses hierarchical, symbolic state representations to analyze agent generalization and planning.
Its modular laws decompose environment dynamics into precondition and effect functions, enabling efficient symbolic world modeling and plan synthesis.
The environment incorporates rigorous agent evaluation protocols, employing achievement-based metrics and state fidelity measures to assess behavioral and structural performance.

The Crafter-OO Environment denotes an object-oriented, open-ended, and programmatically accessible version of the Crafter world—a procedurally generated 2D survival environment that serves as a benchmark for evaluating broad agent capabilities including generalization, planning, exploration, and symbolic reasoning. Crafter-OO formalizes agent–environment interaction using structured, hierarchical state representations and exposes a pure transition function that facilitates symbolic world modeling, plan synthesis, evaluation of @@@@1@@@@, and scalable benchmarking. As documented in recent literature, Crafter-OO has been used to analyze agent generalization (Stanić et al., 2022), to support development of modular foundation models (Park et al., 19 Aug 2025), and as the principal environment for the symbolic modeling framework OneLife (Khan et al., 14 Oct 2025), which infers executable world laws from unguided exploration.

1. Environment Structure and Object-Oriented State Representation

Crafter-OO extends the original Crafter environment by exposing an explicit, object-oriented, symbolic state. The world is represented as a hierarchical structure (commonly serialized as a JSON object), decomposing the environment into discrete objects (e.g., player, cow, tree, zombie, rock) with associated attributes. Each attribute (such as position, health, inventory contents, type, and interaction states) is made directly accessible for analysis and modeling. The transition dynamics are reimplemented as a pure function

$T(s, a) \to \Delta(s)$

mapping the current structured state $s$ and action $a$ to a probability distribution $\Delta(s)$ over next states.

This structured state visibility enables fine-grained reasoning, symbolic manipulation, and identification of causal relationships among objects. Such a design is critical for research in interpretable world modeling, planning, and efficient evaluation of generalization across a spectrum of task variants (Khan et al., 14 Oct 2025).

2. Modular Laws and Programmatic Transition Function

Central to the Crafter-OO design is the decomposition of environment dynamics into modular, programmatic “laws.” Each law is formulated as a conditionally activated program composed of two components:

Precondition function $c_i(s, a)$ : Determines activation, e.g. whether an object (such as a zombie) is present and relevant for the current action.
Effect function $e_i(s, a)$ : Encodes the transition logic, e.g. sampling the next position or updating health.

The transition function is expressed as the composition of all activated laws. Probabilistic predictions for each observable $o$ are computed via the product of active laws:

$p(o = v\,|\,s,a; \Theta) \propto \prod_{i \in I_o(s,a)} \phi_i(o = v\,|\,s,a)^{\theta_i}$

where $I_o(s,a)$ is the set of laws governing $o$ , $\phi_i$ is the law’s probability output, and $\theta_i$ is a learnable law-weight.

This modularity allows efficient learning in large, stochastic, and hierarchically structured environments, with inference and optimization routed only through relevant laws (Khan et al., 14 Oct 2025). Laws can be synthesized by LLMs from observed state transitions, then validated or refined via gradient-based credit assignment mechanisms.

3. Agent Evaluation Protocols and Metrics

Agents in Crafter-OO are evaluated using both achievement-based and world-modeling protocols:

Achievement Structure: As in prior Crafter benchmarks, agents unlock semantically meaningful achievements (e.g., “Collect Wood,” “Craft Sword,” “Defeat Zombie”) during episodes. Achievement success rates and aggregate scores (geometric mean formula)

$S = \exp\left[\frac{1}{N} \sum_{i=1}^N \ln(1 + s_i)\right] - 1$

where $s_i$ is the per-achievement success rate, remain central as holistic performance measures (Hafner, 2021, Stanić et al., 2022).

World Model Evaluation: Crafter-OO introduces protocols for assessing symbolic world models:
- State Ranking: Measures an agent’s or model’s ability to assign higher probability to the true next state compared with semantically plausible distractor states (mutated via illegal modifications). Metrics include Rank@1 and Mean Reciprocal Rank (MRR).
- State Fidelity: Quantifies the closeness between predicted and ground-truth states using edit distance metrics (raw and normalized by state size).

These protocols enable the rigorous evaluation of both low-level behavioral competence and high-level structural understanding in agents and models (Khan et al., 14 Oct 2025).

4. Generalization and Out-of-Distribution (OOD) Analysis

Crafter-OO supports systematic generalization analysis via environment variants:

CrafterOODapp: Alters object appearances and tests generalization to never-before-seen visual variants.
CrafterOODnum: Modifies numbers and distributions of resources and adversaries, challenging adaptability.

Object-centric agents equipped with inductive biases toward compositional representations—such as self-attention and cross-attention architectures—demonstrate improved robustness and interpretability in Crafter-OO and its OOD variants. These architectures bind object features to latent slots, facilitating transfer to novel scenarios. Benchmarked Crafter scores reveal that object-centric approaches attain state-of-the-art performance in both standard and OOD settings (Stanić et al., 2022).

5. Symbolic World Modeling and Planning

The symbolic, executable world model learned in Crafter-OO supports forward simulation and causal planning:

From a structured state, agents or planners can conduct rollouts under candidate action sequences.
The learned transition laws permit simulating multi-step strategies, ranking plans, and distinguishing effective from ineffective strategies (e.g., resource collection before combat vs. immediate engagement).
Empirical results demonstrate that plan rankings under OneLife’s model often match ground-truth rankings for complex scenarios, confirming high-level structural fidelity (Khan et al., 14 Oct 2025).

These capabilities establish a foundation for AI systems capable of abstract reasoning, interpretable planning, and autonomous adaptation in complex, stochastic environments.

6. Foundation Models, Toolkits, and Accessibility

Crafter-OO is equipped with toolkits and open-source resources for reproducible research:

CrafterDojo suite introduces foundation models (CrafterVPT for behavioral priors, CrafterCLIP for vision–language alignment, CrafterSteve-1 for goal-conditioned instruction following), datasets (CrafterPlay, CrafterCaption), and benchmark protocols, collectively supporting rapid innovation and prototyping within the environment (Park et al., 19 Aug 2025). These models leverage synthetic expert trajectories and rule-based, LLM-augmented caption datasets, using architectures such as ResNet, Transformer-XL, and contrastive alignment objectives.
All relevant codebases are public, enhancing accessibility, benchmarking, and collaborative development.

7. Research Implications and Future Directions

Crafter-OO supports key trajectories in agent and environment research:

Enables paper of symbolic reasoning, world model inference, and structured planning from minimal or unguided data, supporting agent autonomy under realistic constraints.
Facilitates evaluation and comparison of induction-based, object-centric, and foundation-model approaches in a standardized, extensible, and interpretable testbed.
Opens avenues for research on modular inductive biases, scalable generalization, planning with explicit world knowledge, and integration with reinforcement learning and meta-learning paradigms.

Researchers are encouraged to extend Crafter-OO with new objects, interactions, and tasks, scale foundation models using behavioral and caption datasets, and employ advanced evaluation protocols—thus leveraging its structure for empirically grounded advances in embodied intelligence.

In summary, Crafter-OO is a programmatically accessible, object-oriented environment built to rigorously analyze agent capabilities, world modeling, generalization, and planning using structured states, modular laws, achievement protocols, and open-source toolkits. Its architecture and benchmarks facilitate interpretable, autonomous AI research, with demonstrated strengths in symbolic modeling, compositional generalization, and foundation model integration (Hafner, 2021, Stanić et al., 2022, Park et al., 19 Aug 2025, Khan et al., 14 Oct 2025).