Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation (1910.09470v1)

Published 21 Oct 2019 in cs.RO and cs.CV

Abstract: Collecting and automatically obtaining reward signals from real robotic visual data for the purposes of training reinforcement learning algorithms can be quite challenging and time-consuming. Methods for utilizing unlabeled data can have a huge potential to further accelerate robotic learning. We consider here the problem of performing manipulation tasks from pixels. In such tasks, choosing an appropriate state representation is crucial for planning and control. This is even more relevant with real images where noise, occlusions and resolution affect the accuracy and reliability of state estimation. In this work, we learn a latent state representation implicitly with deep reinforcement learning in simulation, and then adapt it to the real domain using unlabeled real robot data. We propose to do so by optimizing sequence-based self supervised objectives. These exploit the temporal nature of robot experience, and can be common in both the simulated and real domains, without assuming any alignment of underlying states in simulated and unlabeled real images. We propose Contrastive Forward Dynamics loss, which combines dynamics model learning with time-contrastive techniques. The learned state representation that results from our methods can be used to robustly solve a manipulation task in simulation and to successfully transfer the learned skill on a real system. We demonstrate the effectiveness of our approaches by training a vision-based reinforcement learning agent for cube stacking. Agents trained with our method, using only 5 hours of unlabeled real robot data for adaptation, shows a clear improvement over domain randomization, and standard visual domain adaptation techniques for sim-to-real transfer.

Authors (8)

Rae Jeong (9 papers)
Yusuf Aytar (36 papers)
David Khosid (2 papers)
Yuxiang Zhou (33 papers)
Jackie Kay (19 papers)
Thomas Lampe (25 papers)
Konstantinos Bousmalis (18 papers)
Francesco Nori (51 papers)

Citations (53)

View on Semantic Scholar

Summary

Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation

The paper "Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation" presents a novel methodology aimed at enhancing the effectiveness of sim-to-real transfer in robotic manipulation tasks. This approach focuses particularly on leveraging self-supervised learning techniques to align simulated experiences with real-world robotic data, thus addressing some of the persistent challenges faced by robot learning systems, such as noise and resolution discrepancies in visual state estimation.

Methodology

The authors propose a two-stage adaptation process. Initially, they develop both state-based and vision-based agents within a simulated environment. The state-based agent serves as an asymmetric behavior policy, benefiting from the privileged information in the simulation, which in turn provides rich feedback for training the vision-based agent to learn manipulation skills. A shared replay buffer coupled with a behavior cloning (BC) objective further enhances the sample efficiency for the vision-based agent during this phase.

In the second stage, the research introduces self-supervised sim-to-real adaptation, optimizing sequence-based objectives on both simulated and real robot data. The adaptation employs a technique similar to modality tuning, ensuring that the learned simulation-based skills are suitably transferred to the real settings without the need for performing costly manual alignment between the simulated and actual data.

Contrastive Forward Dynamics

A notable aspect of this work is the introduction of Contrastive Forward Dynamics (CFD). CFD utilizes sequence-based self-supervision, enhancing the sim-to-real transfer by learning latent transition dynamics. This contrasts with the typical time-contrastive networks (TCN) approach, which primarily focuses on temporal consistency of observations. CFD embeds the latent space dynamics by predicting transitions in a way that leverages both observations and actions, drawing closer to model-based control methodologies.

Experimental Outcomes

The experimental analysis reveals the superiority of the proposed methodology over standard domain randomization strategies. Specifically, the two-stage adaptation approach with CFD yielded a task success rate of 62% in real-world cube stacking—substantially outperforming other methods like DANN and zero-shot domain randomization. This demonstrates the efficacy of CFD in bridging domain gaps, thereby facilitating robust skill transfer from simulation to reality.

Implications and Future Prospects

The implications of this paper extend to both theoretical and practical domains. Theoretically, it opens discussions on the role of sequence-based objectives in unsupervised domain adaptation, suggesting potential pathways for more robust sim-to-real transfer frameworks. Practically, this methodology suggests strengthening the reliability and applicability of robotic systems in complex manipulation tasks, reducing reliance on extensive labeled real-world data.

Future research could explore dynamic adaptation strategies that accommodate continuously evolving environments or object properties. Additionally, extending this approach to multi-agent systems or more intricate manipulation tasks could further demonstrate the versatility and scalability of self-supervised adaptation methods within AI systems.

Related Papers

YouTube

Show All Videos