Learning a visuomotor controller for real world robotic grasping using simulated depth images (1706.04652v3)

Published 14 Jun 2017 in cs.RO and cs.AI

Abstract: We want to build robots that are useful in unstructured real world applications, such as doing work in the household. Grasping in particular is an important skill in this domain, yet it remains a challenge. One of the key hurdles is handling unexpected changes or motion in the objects being grasped and kinematic noise or other errors in the robot. This paper proposes an approach to learning a closed-loop controller for robotic grasping that dynamically guides the gripper to the object. We use a wrist-mounted sensor to acquire depth images in front of the gripper and train a convolutional neural network to learn a distance function to true grasps for grasp configurations over an image. The training sensor data is generated in simulation, a major advantage over previous work that uses real robot experience, which is costly to obtain. Despite being trained in simulation, our approach works well on real noisy sensor images. We compare our controller in simulated and real robot experiments to a strong baseline for grasp pose detection, and find that our approach significantly outperforms the baseline in the presence of kinematic noise, perceptual errors and disturbances of the object during grasping.

PDF Abstract

Overview of Visuomotor Controller for Robotic Grasping

The paper "Learning a visuomotor controller for real world robotic grasping using simulated depth images" presents a novel approach to enhance robotic grasping capabilities, a critical function for applications in unstructured real-world scenarios like household environments. The central focus is on addressing the key challenges of managing unexpected changes in objects and dealing with kinematic noise or errors in robots during manipulation tasks. The research converges on developing a closed-loop controller driven by a convolutional neural network (CNN) that guides robotic grippers using simulated depth images.

Methodology

The proposed method leverages a wrist-mounted depth sensor to acquire images that inform grasping decisions dynamically. Simulated environments, leveraging OpenRAVE, provide the basis for training data, thus circumventing the prohibitive time and costs associated with real-world training. The model's CNN architecture is designed to predict the distance to viable grasp configurations, not just candidate grasps, a departure from one-shot detection methods.

Key technical contributions include:

Utilizing depth rather than RGB data to ensure robust simulation-to-reality transfer. Depth data, though less informative, is accurately replicable through ray tracing, facilitating effective training in simulated settings.
Innovating a CNN design that efficiently predicts a grasp's viability through distance regression, thus enabling dynamic control updates based on immediate feedback.
Employing a novel grasp approach that measures the distance in terms of action space, refining accuracy by reducing actions' spatial scale frame-by-frame.

Results and Analysis

The efficacy of this approach is tested through comprehensive experiments in both simulation and physical environments with a UR5 robot. Notably, the new controller significantly outperforms a strong baseline, the Grasp Pose Detection (GPD) method, under conditions of kinematic noise, showcasing its robust adaptability to realistic variables like motion disturbances and object shifts.

In scenarios with dense clutter or isolated object presentation, the controller achieves high grasp success rates, matching or closely approaching the performance of static GPD methods. However, it demonstrates superior adaptability in dynamic scenarios, evidenced by a drastically higher success rate when objects are intentionally repositioned after initial detection.

Implications and Future Directions

The practical implications of this research are manifold, offering a substantial step toward adaptable, autonomous robotic systems capable of high-fidelity operation in unpredictable environments. The research underscores the promise of simulated training environments to replicate and anticipate real-world variances effectively, encouraging further exploration into hybrid datasets combining both simulated and real image data.

Theoretically, this work opens discussions around optimizing CNN-based controllers for grasping tasks, particularly concerning learning policies that dynamically adjust to real-time sensor feedback. Future work may explore extensions such as integrating domain adaptation techniques to further bridge the gap between simulated and real-world data and enhancing the model's ability to differentiate target objects amidst clutter for task-based grasping strategies.

Overall, this research demonstrates robust potential for leveraging closed-loop feedback mechanisms in robotic grasping, promoting improved adaptability and efficiency in real-world applications. This foundational work could significantly inform future advancements in autonomous robotics, particularly within domains requiring sophisticated manipulation capabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Ulrich Viereck (3 papers)
Andreas ten Pas (9 papers)
Kate Saenko (178 papers)
Robert Platt (70 papers)

Citations (190)

View on Semantic Scholar

Learning a visuomotor controller for real world robotic grasping using simulated depth images (1706.04652v3)

Overview of Visuomotor Controller for Robotic Grasping

Methodology

Results and Analysis

Implications and Future Directions

Related Papers