Volumetric Grasping Network: Real-time 6 DOF Grasp Detection in Clutter (2101.01132v1)

Published 4 Jan 2021 in cs.RO

Abstract: General robot grasping in clutter requires the ability to synthesize grasps that work for previously unseen objects and that are also robust to physical interactions, such as collisions with other objects in the scene. In this work, we design and train a network that predicts 6 DOF grasps from 3D scene information gathered from an on-board sensor such as a wrist-mounted depth camera. Our proposed Volumetric Grasping Network (VGN) accepts a Truncated Signed Distance Function (TSDF) representation of the scene and directly outputs the predicted grasp quality and the associated gripper orientation and opening width for each voxel in the queried 3D volume. We show that our approach can plan grasps in only 10 ms and is able to clear 92% of the objects in real-world clutter removal experiments without the need for explicit collision checking. The real-time capability opens up the possibility for closed-loop grasp planning, allowing robots to handle disturbances, recover from errors and provide increased robustness. Code is available at https://github.com/ethz-asl/vgn.

Citations (150)

View on Semantic Scholar

Summary

The paper presents a novel FCN that leverages TSDF embeddings to predict grasp quality, orientation, and gripper widths per voxel.
It achieves a significant speedup with 10 ms inference time, vastly outperforming traditional methods like the GPD algorithm.
The method bridges simulation and real-world applications by transferring models trained on synthetic data to physical robotic setups.

An Analysis of the Volumetric Grasping Network for Real-Time 6 DOF Grasp Detection

The paper "Volumetric Grasping Network: Real-time 6 DOF Grasp Detection in Clutter" presents a novel approach to enhance robotic grasping capabilities in cluttered environments using the Volumetric Grasping Network (VGN). This paper addresses the need for real-time synthesis of grasps with six degrees of freedom (DOF) directly from 3D scene information. The VGN is designed to improve upon limitations identified in current grasp detection methods, particularly those concerned with efficiency and the ability to handle densely packed scenes.

Methodological Advancements

The key contribution in this research is the use of a Fully Convolutional Network (FCN) that operates on a Truncated Signed Distance Function (TSDF) representation of the input scene. This framework facilitates volumetric embedding of the grasping workspace, enabling the network to predict grasp qualities, gripper orientations, and opening widths per voxel in real time. The real-time computation, achieving inference within 10 ms on a GPU, is particularly notable, as it significantly surpasses existing techniques like the Grasp Pose Detection (GPD) algorithm, reducing computation time from seconds to just milliseconds.

Key methodological advancements introduced in this paper include:

TSDF Utilization: Employing TSDFs for sensor data integration promotes robustness by smoothing out noise and providing reliable 3D scene information for feature extraction during training.
Network Architecture: The FCN architecture with multiple heads allows it to predict, per voxel, the grasp quality, orientation, and necessary gripper width, facilitating accurate and robust 6 DOF grasp modeling.
Real-Time Implementation: By leveraging the GPU's processing power, this method delivers real-time grasp planning capabilities, which are crucial for dynamic and reactive robotic manipulation in cluttered settings.

Experimental Results and Observations

The experiments conducted, both in simulation and with physical robotic setups, validate the VGN's efficiency and efficacy. Notable outcomes include:

High Success Rates: In clutter removal tasks, the VGN demonstrated high grasp success rates, particularly excelling in environments with geometric primitives and complex scenes with diverse objects.
Effective in Diverse Clutters: The system's ability to manage side grasps and other complex grasps in packed environments marks a substantial leap from traditional top-down only approaches.
Simulation to Real-World Transition: Importantly, the paper illustrates how a model trained solely on synthetic data could be effectively transferred to real-world applications without re-training.

Implications and Future Directions

The research has significant implications for advancing robotic manipulation capabilities, particularly in unstructured environments and applications where real-time decision-making is crucial, such as warehouse automation or assistive robotics in healthcare.

Looking forward, several avenues could further enhance this work:

Adversarial Simulations: Introducing more complex simulation dynamics may bridge the performance gap observed with tactile-based failures in physical tests.
Robustness Against Variability: Equipping the system to handle transparent and specular objects would markedly expand the VGN's applicability.
Integration with Feedback Loops: Closing the visual feedback loop for dynamic adjustments during execution could further minimize failures related to unexpected object movements or miscalibrations, pushing the system towards more autonomous manipulation capabilities.

Overall, the VGN represents a significant step forward for robotic grasping technology, offering a promising approach to 6 DOF grasp detection that balances computational efficiency with accuracy. This research not only paves the way for more flexible robotic systems but also invites further investigation into robust, real-time object interaction in complex environments.

PDF Markdown

Related Papers

GitHub

GitHub - ethz-asl/vgn: Real-time 6 DOF grasp detection in clutter. (230 stars)

YouTube

Show All Videos