Executable coherence of multi-step trajectories from LLM-based world models

Establish whether large language model–based world models can generate coherent multi-step trajectories that remain executable when transferred to the corresponding real environments.

Background

Existing evaluations have largely emphasized single-step predictions and have not fully addressed long-horizon consistency or compounding errors—critical properties for using world models as simulators or for model-based agent training.

The authors therefore test long-horizon rollout stability, world model–to–real transfer, and generalization across environments and agents to address this open question.

References

Consequently, it remains an open question whether LLM-based world models can produce coherent multi-step trajectories that are executable in real environments.

From Word to World: Can Large Language Models be Implicit Text-based World Models? (2512.18832 - Li et al., 21 Dec 2025) in Related Works (Section 2)