Learning procedures and reliability thresholds for LLM-based world models
Develop and evaluate training methodologies for large language model–based world models, and establish criteria that determine when these models are sufficiently reliable to improve performance of downstream agents.
Sponsor
References
While prior work has explored LLMs as simulators, experience generators, or planning interfaces \citep{chen2025scalingagentlearningexperience,li2025simulatingenvironmentsreasoningmodels,wu2025rlvrworldtrainingworldmodels,gu2025llmsecretlyworldmodel,wang2025world,he2025pretrained}, it remains unclear how to learn a world model and when it is reliable enough to improve downstream agents.
— From Word to World: Can Large Language Models be Implicit Text-based World Models?
(2512.18832 - Li et al., 21 Dec 2025) in Introduction (Section 1)