Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

144 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

World Model in GS AI

Updated 1 July 2025

World Model in GS AI is a modular internal representation that enables agents to perceive and plan in environments with limited observability and unknown rules.
It employs interpretable finite automata and first-order logic to infer hidden states, interpret sensory feedback, and enforce game constraints in complex, arbitrary settings.
The framework supports incremental learning, simulation-based planning, and adaptive decision-making, laying a foundation for robust General Strategy AI applications.

A world model in General Strategy (GS) AI, as introduced in "AI in arbitrary world" (1210.2715), is an explicit, modular internal representation that enables an agent to perceive, reason about, and plan within environments where the state is only partially observable and the rules are not known a priori. The world model’s function is to facilitate inference of hidden state, interpretation of sensory feedback, and correct action selection, even in "arbitrary worlds" (those with unfamiliar, initially opaque structure). The paradigm is exemplified in the context of a version of Tick-Tack-Toe (World 2) where the agent must act with only local information, motivating principled approaches for learning in unknown, partially observable systems.

1. Formal Problem Setting: Partial Observability and Arbitrary Worlds

The agent interacts with a world where full observation of the state is restricted—for instance, in "World 2," it perceives only the current cell content through lamp signals, never the entire board configuration. The environment must be inferred through sequences of local observations and feedback to actions (e.g., "bad move" indicators). The agent’s objective is to build an internal model robust enough to enable effective play and learning, even as the true system structure and rules are concealed.

Crucially, the combinatorial state space makes end-to-end memorization intractable: in Tick-Tack-Toe, there are $3^9 = 19,\!683$ board configurations, most of which are illegal or irrelevant. The agent must discover and exploit hidden structure to manage this complexity.

2. Modular Construction Using Automata and First-order Logic

The methodology adopts a modular, hierarchical approach, decomposing the world model into:

a) Small, Behavior-Significant Automata

These are interpretable finite state machines that capture observable and latent aspects of the world:

Automaton (1): Eye Column Position
- States: $\{\text{Left},\,\text{Middle},\,\text{Right}\}$
- Transitions: Based on "left"/"right" movement commands; attempting an illegal move (e.g., "left" from Left) results in a "bad move" signal, with no state change.
Automaton (2): Eye Row Position
- States: $\{\text{Top},\,\text{Middle},\,\text{Bottom}\}$
- Transitions: Analogous to column automaton.
Automaton (3): Game Over State
- States: $\{\text{Ongoing},\,\text{Game Over}\}$
- Transitions: Triggered by detection of victory/loss lamps.

Each automaton is designed for discovery and tractable manipulation, with sizes small enough for brute-force or heuristic search. These automata parse action–percept sequences into structured summaries of agent location and environment status.

b) Second-Level Cell Automata

To avoid the intractability of whole-board enumeration, the model constructs nine "cell automata," each tracking the content (Empty, X, O) of a board cell. Each automaton’s state can be observed only when the agent’s eye is over the corresponding cell and interprets the current lamp state accordingly. This factorization leverages world modularity for scalable inference and learning.

c) First-Order Logic Formulas

World properties and constraints that cannot be conveniently encoded with finite automata are formalized as first-order logic formulas. An example is the rule preventing the opponent ("Tom") from placing more than one "O" per turn. The formula,

$\forall T \; \forall A, B \left( appear(O, A, T) \wedge appear(O, B, T) \rightarrow A=B \right)$

with $appear(O, A, T) := isO(A, T) \wedge \neg isO(A, prev(T))$ ,

codifies that no two different cells ( $A \neq B$ ) acquire an "O" at the same time $T$ , enforcing the move legality constraint across the decomposed board representation.

3. Learning and Adaptation in Unknown Environments

The agent does not begin with a hardcoded world model. Instead, it is expected to incrementally construct automata and infer rules via tabula rasa learning. Model components are detected by analyzing regularities in action–observation history: predictable lamp responses, consistent transitions, or systematic restrictions (such as "bad move" feedback) inform the structure and content of the automata and the rules they obey.

As the agent’s experience grows, new automata (capturing previously unmodeled aspects) or additional logical constraints can be grafted onto the world model, supporting continual adaptation as more of the world's rules become visible.

4. Planning and Decision-Making Using the World Model

Once a sufficient internal model is in place, classical planning algorithms such as Min-Max (game-tree search) become applicable: the agent can simulate future world states by iterating automata transitions and applying logical constraints. This enables the agent to select moves that maximize chances of victory (or minimize loss), as if it had access to full state information, thereby closing the perceptual gap imposed by partial observability.

The general framework supports the following pipeline:

Perception: Map incoming observations to automaton state transitions and update cell status.
Model update: Revise world model structure as new behaviors or rules are inferred.
Planning: Apply logical and automata-derived constraints to simulate possible futures.
Action selection: Choose actions expected to yield the best outcome under the inferred world dynamics.

5. Generalization and Foundation for GS AI

The architecture is not specific to Tick-Tack-Toe but is posited as a template for General Strategy AI (GS AI):

Universality: By focusing on discoverable, modular, and interpretable automata/logics, the approach scales to arbitrary worlds, provided environmental regularities can be extracted and encoded within the chosen formalism.
Compositionality: System decomposition into independently learnable elements (navigation, local state, global state, action constraints) addresses the state explosion problem and allows for robust model extension.
Adaptability: The agent can cope with initially unseen or adversarial state transitions by incrementally refining its component automata or logic formulas as evidence accumulates, always maintaining compatibility with observed feedback.

6. Limitations and Practical Considerations

Unwitnessed States: There exist legal world states the agent may never directly encounter; the model relies on capturing only the input–output interface, not full latent state enumeration.
Expressivity: Automata and first-order logic together are sufficient for many environments, but may not capture all possible dynamics—rich or non-modular environments could require more expressive or higher-order formalisms.
Learning Complexity: Automated discovery of concise automata and logical formulas remains a nontrivial algorithmic challenge and is an area for ongoing development.

7. Summary Table: World Model Components in GS AI

Component	Representation	Learning Target	Role in GS AI
Navigational State	Finite automata (eye pos.)	Locomotion feedback	Localizes agent, constrains moves
Local World State	Per-cell automata	Cell observations	Tracks board and partial state
Global Constraints	First-order logic formulas	Transition patterns	Encodes game rules, invariants
Planning Algorithm	Min-Max or similar	Simulated transitions	Decision-making over world model

World modeling, as formalized by modular automata and logical formulas, provides an effective and generalizable substrate for intelligent behavior in arbitrary, partially observable environments—a foundational requirement for GS AI systems.

PDF Markdown Chat (Upgrade)

References (1)

AI in arbitrary world (2012)