Learning Predictive Representations for Deformable Objects Using Contrastive Estimation (2003.05436v1)

Published 11 Mar 2020 in cs.LG, cs.CV, cs.RO, and stat.ML

Abstract: Using visual model-based learning for deformable object manipulation is challenging due to difficulties in learning plannable visual representations along with complex dynamic models. In this work, we propose a new learning framework that jointly optimizes both the visual representation model and the dynamics model using contrastive estimation. Using simulation data collected by randomly perturbing deformable objects on a table, we learn latent dynamics models for these objects in an offline fashion. Then, using the learned models, we use simple model-based planning to solve challenging deformable object manipulation tasks such as spreading ropes and cloths. Experimentally, we show substantial improvements in performance over standard model-based learning techniques across our rope and cloth manipulation suite. Finally, we transfer our visual manipulation policies trained on data purely collected in simulation to a real PR2 robot through domain randomization.

Authors (4)

Wilson Yan (12 papers)
Ashwin Vangipuram (1 paper)
Pieter Abbeel (372 papers)
Lerrel Pinto (81 papers)

Citations (180)

View on Semantic Scholar

Summary

The paper introduces Contrastive Forward Modeling (CFM) as a novel method for learning latent models to predict dynamics in deformable object manipulation.
It employs an InfoNCE-based contrastive learning approach to optimize visual and dynamic latent space representations, outperforming traditional techniques in simulation tasks.
Real-world experiments using PR2 robots validate CFM’s generalization capabilities, with domain randomization enhancing success in reaching target configurations.

Learning Predictive Representations for Deformable Objects Using Contrastive Estimation

This paper introduces an advanced framework for learning predictive representations necessary for the manipulation of deformable objects, such as ropes and cloths, by employing contrastive estimation. The primary goal is to overcome challenges associated with visual model-based learning in environments where canonical state representations and dynamics are complex and non-linear. The authors propose an approach that optimizes visual representation models and dynamics using contrastive methods, leveraging simulation data and empirical evaluations.

Methodological Framework

The framework, termed Contrastive Forward Modeling (CFM), integrates contrastive learning to derive latent space models for representing both visual and dynamic aspects of deformable objects. Unlike direct pixel-space modeling, CFM encodes observations into latent spaces and learns models predicting transitions in these spaces. The contrastive approach, specifically using InfoNCE loss, facilitates mutual information maximization between predictively learned latent embeddings, ensuring robust generative and predictive capacity.

Experimental Setup and Results

In simulation experiments, the authors evaluate their approach in multi-goal scenarios involving rope and cloth manipulation tasks. They demonstrate that CFM substantially outperforms traditional models, such as visual forward models and autoencoders, in tasks demanding different configurations and orientations. Key metrics, like pairwise geom distances between achieved and target states, highlight the effectiveness of contrastive learning in improving task success rates.

Real-World Application

Beyond simulations, the paper evaluates the transferability of learned models to real robots using PR2 platforms, employing domain randomization during training to ensure generalization. In these tasks, CFM is shown to excel in reaching desired goal states, further confirmed by intersection metrics in pixels between achieved and target images for rope and cloth configurations.

Theoretical and Practical Implications

The use of contrastive estimations presents several theoretical implications, particularly in improving the generalization of latent space models for complex robotics tasks. Practically, the research highlights the potential of these models to facilitate efficient learning processes, reducing the need for extensive dataset collection from real-world environments. This could offer significant advantages in robotics where real-world data collection is resource-intensive.

Future Directions

The successful application of CFM suggests promising future research avenues, including exploring larger and more complex deformable objects in real-world environments. Additionally, incorporating this method into other robot platforms and expanding tasks beyond cloth and rope manipulation could further establish its versatility. Lastly, refining contrastive loss functions could enhance model robustness and offer new insights into embedding structures.

In conclusion, this paper provides a comprehensive exploration of contrastive predictive representation learning for manipulating deformable objects, demonstrating improvements in efficiency and generalization across both simulated and real-world tasks. The approach serves as a vital contribution to robotic manipulation, presenting new pathways for adaptive learning in dynamic environments.

PDF Markdown