More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch (1805.11085v2)

Published 28 May 2018 in cs.RO, cs.LG, and stat.ML

Abstract: For humans, the process of grasping an object relies heavily on rich tactile feedback. Most recent robotic grasping work, however, has been based only on visual input, and thus cannot easily benefit from feedback after initiating contact. In this paper, we investigate how a robot can learn to use tactile information to iteratively and efficiently adjust its grasp. To this end, we propose an end-to-end action-conditional model that learns regrasping policies from raw visuo-tactile data. This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions. Our approach requires neither calibration of the tactile sensors, nor any analytical modeling of contact forces, thus reducing the engineering effort required to obtain efficient grasping policies. We train our model with data from about 6,450 grasping trials on a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger. Across extensive experiments, our approach outperforms a variety of baselines at (i) estimating grasp adjustment outcomes, (ii) selecting efficient grasp adjustments for quick grasping, and (iii) reducing the amount of force applied at the fingers, while maintaining competitive performance. Finally, we study the choices made by our model and show that it has successfully acquired useful and interpretable grasping behaviors.

Citations (305)

View on Semantic Scholar

Summary

The paper introduces an end-to-end action-conditional model that uses raw visuo-tactile inputs to iteratively adjust a robot's grasping configuration.
This visuo-tactile method outperforms baselines, achieves gentle force application, and shows high success rates on unseen objects.
Integrating tactile sensing with vision significantly enhances robotic grasping capabilities, enabling more sophisticated systems for handling diverse and fragile items.

Learning to Grasp and Regrasp with Visuo-Tactile Sensing

The paper introduces an innovative approach to robotic grasping that leverages both visual and tactile feedback to improve the execution of grasping tasks. The primary contribution of the research is the development of an end-to-end action-conditional model that uses raw visuo-tactile inputs to iteratively adjust a robot's grasping configuration. This work highlights an important step forward in the field of robotics, especially in the nuanced task of grasping and manipulating objects using sensory feedback.

Methodology Overview

The paper proposes an end-to-end approach where a deep, multimodal convolutional neural network is employed to predict the outcome of a potential grasp adjustment action. Through this prediction, the system selects actions that maximize the likelihood of a successful grasp. The model operates without the need for calibration of tactile sensors or manual contact force modeling, significantly reducing the complexity and engineering workload typically implicated in designing efficient grasping systems.

Training is performed using a dataset collected from approximately 6,450 grasping trials, utilizing a two-finger gripper equipped with GelSight sensors, which provide high-resolution tactile feedback. The authors employ a blend of raw RGB visual data and tactile inputs to inform their model, relying on the unique contributions of each modality for effective decision-making.

Experimental Results and Findings

The research presents robust empirical evidence to demonstrate the efficacy of the proposed method. During experiments, the visuo-tactile approach not only outperformed baseline methods in estimating grasp adjustment outcomes but also in achieving efficient, quick grasp adjustments with reduced force application. The ability of the model to discern and apply forces gently is particularly remarkable, offering a pragmatic solution for handling fragile objects—an aspect crucial for many real-world applications.

The paper reports a high success rate in grasping trials with previously unseen objects, underscoring the model's generalization capabilities. It also illustrates the model's interpretability, an essential feature for further progress in the domain of autonomous robot learning.

Theoretical and Practical Implications

At its core, this research underscores the significance of integrating tactile sensing with vision to enhance robotic interaction capabilities. The paper advances the fields of perception and manipulation in robotics by successfully bridging visual inputs, tactile sensations, and machine learning for developing adaptive grasping strategies.

From a practical standpoint, this approach potentially paves the way for more sophisticated robotic systems capable of operating in complex, dynamic environments. The ability to adjust grasps dynamically via sensory feedback can lead to improvements in automation tasks ranging from industrial applications to service robots interacting with diverse and fragile items.

Future Directions

The implications of this paper suggest various future research avenues. Future work could focus on integrating more diverse sensor modalities or deploying reinforcement learning techniques that might further enhance model performance through continuous interaction learning. Another promising direction would involve exploring this approach in cluttered environments and leveraging more flexible robotic platforms that can benefit from torque control and finer motion adjustments. This could lead to more human-like dexterity in robotic systems.

In sum, this paper enriches the toolbox of robotics researchers and practitioners by demonstrating the utility of combining visual and tactile sensory inputs within a deep learning framework for grasping and manipulation. Through this interplay of modalities and machine intelligence, the research brings robotic capabilities closer to handling the intricacies of real-world task execution with efficiency and adaptability.

Related Papers

YouTube

Show All Videos