Capturing action-conditioned physical dynamics for robotic manipulation
Develop action-conditioned video-based world models that accurately capture contact-rich interactions, robot kinematics, and fine-grained physical dynamics necessary to predict object motion under specified 7-DoF end-effector actions and to support closed-loop planning and control for robotic manipulation tasks in RLBench.
References
This gap suggests that while current visual world models can effectively guide perception and navigation, capturing fine-grained physical dynamics and action-conditioned object motion remains an open challenge.
— World-in-World: World Models in a Closed-Loop World
(2510.18135 - Zhang et al., 20 Oct 2025) in Section 4.1 (Benchmark Results), Robotic Manipulations