Complex interactions with deformable objects and human collaboration remain unexplored
Extend the VisualMimic hierarchical sim-to-real framework—consisting of a task-agnostic low-level keypoint tracker trained from human motion data and a task-specific high-level visuomotor keypoint generator—to support complex humanoid loco-manipulation involving deformable objects and collaboration with humans, enabling execution from egocentric visual and proprioceptive inputs without external object state estimation.
References
While our hierarchical design generalizes across a range of loco-manipulation tasks, more complex interactions involving deformable objects or human collaboration remain unexplored.
— VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation
(2509.20322 - Yin et al., 24 Sep 2025) in Conclusions and Limitations, Limitations paragraph