Effective and scalable acquisition of world models for LLM agents

Establish effective and scalable procedures for acquiring internal world models—comprising explicit state representations and transition dynamics modeling—for large language model agents so that these agents can ground reasoning in environment rules and generalize in out-of-distribution interactive settings.

Background

The paper argues that LLM agents trained with reinforcement learning struggle in out-of-distribution environments because they lack an internal world model that captures states and transition dynamics. The authors decompose such a world model into state representation and transition modeling and propose SPA, a self-play supervised finetuning stage followed by PPO, as a method to inject this knowledge before policy optimization.

While classical model-based RL demonstrates the benefits of explicit world models, the authors note that for LLM agents the central challenge is not only leveraging a world model but how to acquire it effectively and at scale. This motivates their framework and underscores the broader open problem of scalable world-model acquisition for agentic LLMs.

References

However, the question of how to effectively and scalably acquire such world models for LLM agents remains open.

— Internalizing World Models via Self-Play Finetuning for Agentic RL (2510.15047 - Chen et al., 16 Oct 2025) in Section 1: Introduction

Effective and scalable acquisition of world models for LLM agents

Sponsor

Background

References

Related Problems