Sim-to-Real Transfer for Vision-Based Manipulation in Cluttered, Unstructured Environments

Determine effective techniques to achieve reliable sim-to-real transfer of robot manipulation policies trained in simulation with synthetic rendered visual observations to cluttered, unstructured real-world environments, particularly for tasks that require complex contact dynamics.

Background

The paper discusses prior work on autonomous data generation in simulation and notes persistent difficulties when deploying such policies in the real world. Specifically, transferring policies trained with synthetic renders to cluttered, unstructured environments is challenging, especially when tasks involve complex contact dynamics.

To avoid these limitations, the authors propose generating data directly in the real world via their Tether system. Nonetheless, they explicitly acknowledge that achieving robust sim-to-real transfer in such settings remains unresolved in the broader literature.

References

However, simulation-based approaches struggle with sim-to-real transfer within cluttered, unstructured environments, which remains an open challenge especially for tasks involving complex contacts and vision-based policies trained on synthetic renders (Blanco-Mulero et al., 2024; Yu et al., 2024; Lin et al., 2025b).

Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping  (2603.03278 - Liang et al., 3 Mar 2026) in Section 2, Related Work