Planning capability of robot video world models in unseen scenarios

Determine whether video generative world models for robot manipulation, which are typically trained and evaluated on in-distribution data, can effectively facilitate planning in out-of-distribution, unseen scenarios.

Background

Recent works have introduced video generative models to simulate robot manipulation tasks, with potential benefits for policy evaluation, reinforcement learning, and policy steering. However, these models are commonly trained and assessed on datasets that match their training distribution.

The generalization of such models to novel environments and tasks remains a critical question for deploying general-purpose robots. Establishing whether these world models can reliably facilitate planning in unseen scenarios would clarify their utility beyond controlled settings.

References

Motivated by successes in these domains, recent works have also introduced video generative models to simulate robot manipulation tasks, which hold great promise for scalable policy evaluation, reinforcement learning, and policy steering. However, existing models are typically trained and evaluated in in-distribution settings, leaving it unclear whether these models can truly facilitate planning in unseen scenarios.

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos  (2602.06949 - Gao et al., 6 Feb 2026) in Section: Related Work (World model)