World State Model: Graph-based Dynamics Learning

Updated 7 August 2025

World State Model is a formalism that captures graph-based actions and probabilistic state transitions in compositional environments using induced world programs.
It employs graph neural networks to extract state embeddings and generate action embeddings via graph rewriting, ensuring dynamic adaptation to structured data.
The framework enables efficient model-based planning in domains such as chemical synthesis, achieving high success rates with strategies like Monte Carlo Tree Search.

A world state model refers to a formalism or learned representation that captures both the set of possible actions and the probabilistic dynamics governing state transitions in complex, often compositional, environments. The "World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces" framework introduces a paradigm in which the agent induces its own world model—termed a "world program"—by inferring both the structure of the action space and the underlying dynamics directly from observed sequences of state transitions. This approach eschews the traditional assumption of a known action space and specifically addresses environments where states and actions are best described as graphs or multisets of graphs, making the underlying representation compositional and highly expressive.

1. Formalization and Components of World Programs

The world program is defined as a triple (𝒜, 𝒜(s), 𝒯):

𝒜: The set of all possible actions available to the agent. In the discussed formalism, these actions are not assumed a priori but are induced as graph rewriting rules, or subroutines, that describe how a state (a multiset of graphs) transforms into another.
𝒜(s): A function mapping a given state s ∈ 𝒮 to the set of applicable actions, i.e., 𝒜(s): 𝒮 → ℘(𝒜), where ℘ denotes the power set. This captures the insight that, for compositional state spaces, not all actions are universally applicable but are context-dependent, determined by permissible edits and rewritings of the graph structure.
𝒯: The state transition function, mapping (state, action, resulting state) triples to probabilities: 𝒯: 𝒮 × 𝒜 × 𝒮 → [0, 1]. This function captures the stochastic dynamics induced by actions applied to states.

Key equations highlighting what must be learned:

$\text{(1) } \mathcal{A} \quad \text{(unknown)}$

$\text{(2) } \mathcal{A}(s): \mathcal{S} \rightarrow \wp(\mathcal{A})$

$\text{(3) } \pi: \mathcal{S} \rightarrow \mathcal{A}$

$\text{(4) } \mathcal{T}: \mathcal{S} \times \mathcal{A} \times \mathcal{S} \rightarrow [0,1]$

$\text{(5) } \mathcal{R}: \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$

Here, only the reward function $\mathcal{R}$ is assumed known; the action set, state-dependent action mapping, and transition function must be induced from data.

2. Dynamics Model Learning in Compositional Environments

In environments with compositional structure—such as chemical reaction networks or relational domains—states are represented as multisets of graphs. The agent collects batches of observed transitions $(s_t, s_{t+1})$ , then hypothesizes the action (graph rewrite rule) that could have produced $s_{t+1}$ from $s_t$ .

The induction process involves:

Enumerating plausible actions by analyzing the structural difference between $s_t$ and $s_{t+1}$ , typically as edits on graphs.
Sampling actions $a \in \mathcal{A}$ and determining, by simulation or matching, whether applying $a$ leads to $s_{t+1}$ .
Using this data to train a neural transition function $\mathcal{T}(s_t, a, s_{t+1})$ as a binary classifier that predicts the validity or probability of the transition.

The neural implementation utilizes graph neural networks (GNNs) to embed states; the action embedding is formulated as the difference between the embeddings of $s_{t+1}$ and $s_t$ :

$\text{Embed}(s_t) - \text{Embed}(s_{t+1}) \rightarrow \text{Action Embedding}$

The GNN is crucial for its relational inductive bias, which is necessary to model the compositionality and structural invariance of graph-based states.

3. Induction of Action Spaces via Graph Rewriting

Unlike classical RL, where an action set is fixed, the world program approach induces actions by examining the deltas between consecutive states. In graph-based environments, this naturally corresponds to learning rewrite rules of the form $p: L \to R$ , replacing a subgraph $L$ with a subgraph $R$ . The system algorithmically:

Identifies changed nodes/edges.
Abstracts corresponding graph rewriting rules.
Grows the action set $\mathcal{A}$ by adding observed novel rules.

A key practical application detailed in the paper is chemical synthesis planning, where each molecule is a labeled graph and actions correspond to chemical reaction rules. After processing large datasets of reaction steps, the system extracted a library of about 300,000 distinct graph rewriting actions. For efficiency in planning, a neural network is trained to select a small subset (top- $k$ ) of the most probable rewrite rules given a state, producing the mapping $\mathcal{A}(s)$ .

4. Model-Based Planning and Empirical Results

The induced world program $(\mathcal{A}, \mathcal{A}(s), \mathcal{T})$ is used as a black-box simulator for model-based planning tasks. Specifically, the framework is validated in a challenging chemical synthesis domain, where the task is retrosynthesis: recursively deconstructing a target molecule into known precursors via a sequence of learned reaction steps.

Monte Carlo Tree Search (MCTS), as instantiated in the PUCT-MCTS variant, is applied over the simulator defined by the world program.
The system solves 95.24% of tested synthesis planning tasks with an average time per plan of 13.0 seconds.

This demonstrates both the feasibility of model-based planning in complex compositional domains via induced world programs and that learned action spaces and dynamics can rival hand-coded simulators in empirical performance.

5. Extensibility and Outstanding Challenges

One of the principal challenges highlighted is generalization to new domains where ground-truth action spaces or simulators are unavailable. The formalism proposed in the paper is applicable beyond chemistry, with possible extensions to discrete games (Go, Chess) as grid-graph state/action domains, and even to relational domains involving automated reasoning or code optimization.

Key challenges include:

Learning faithful simulators from limited historical data.
Managing large action spaces and ensuring sampled plans do not exploit artifacts of imperfectly modeled dynamics.
Data efficiency in action/dynamics induction, especially for high-branching or continuous domains.

A central open question proposed for the community is the construction of world-program–based planners in domains where the true action and state spaces admit rich, compositional structure but are not directly given.

6. Methodological Implications and Future Directions

By formalizing the process of learning both action space and transition function jointly from state-state transitions, the world program approach broadens the scope of model-based RL beyond regimes with known, finite action sets and simple state encodings. The integration of symbolic program induction (graph rewriting extraction) with relational deep learning (GNNs) provides a scalable pattern for constructing powerful, general-purpose learned simulators.

Future directions include:

Extending the framework to continuous state/action spaces and robotic inverse dynamics.
Exploiting hierarchical or option-based RL methods, using learned atomistic actions as building blocks for macro-actions.
Application to further relational domains, such as theorem proving, interpretable program synthesis, or complex physical construction planning.

7. Summary and Significance

The world state model, as instantiated by the world program formalism, enables agents to model and plan in previously intractable compositional spaces. Instead of requiring explicit enumeration of all possible actions or exhaustive simulators, the agent autonomously induces graph-based actions and dynamics rules from observed transitions, then deploys a neural-symbolic simulator for planning. This approach demonstrates practical success in domains such as chemical synthesis, achieving high plan completion rates and efficient planning times, and positions itself as a foundational strategy for scalable, model-based decision-making in structured, relational environments.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to World State Model.