Extending VLAs and data pipelines beyond rigid-body dynamics
Extend Vision-Language-Action models and dynamic manipulation data-collection pipelines to tasks involving non-rigid or fluid objects with continuously evolving states, in both simulation and real-world settings.
References
Our data pipeline assumes rigid-body state estimation, whereas many dynamic tasks involve non-rigid or fluid dynamics with continuously evolving states that are difficult to represent in both simulation and the real world. Extending VLA models and data pipelines to such settings remains an open challenge.
— DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
(2601.22153 - Xie et al., 29 Jan 2026) in Section: Discussion and Future Work