Shape Completion Enabled Robotic Grasping (1609.08546v2)

Published 27 Sep 2016 in cs.RO

Abstract: This work provides an architecture to enable robotic grasp planning via shape completion. Shape completion is accomplished through the use of a 3D convolutional neural network (CNN). The network is trained on our own new open source dataset of over 440,000 3D exemplars captured from varying viewpoints. At runtime, a 2.5D pointcloud captured from a single point of view is fed into the CNN, which fills in the occluded regions of the scene, allowing grasps to be planned and executed on the completed object. Runtime shape completion is very rapid because most of the computational costs of shape completion are borne during offline training. We explore how the quality of completions vary based on several factors. These include whether or not the object being completed existed in the training data and how many object models were used to train the network. We also look at the ability of the network to generalize to novel objects allowing the system to complete previously unseen objects at runtime. Finally, experimentation is done both in simulation and on actual robotic hardware to explore the relationship between completion quality and the utility of the completed mesh model for grasping.

Authors (5)

Jacob Varley (14 papers)
Chad DeChant (6 papers)
Adam Richardson (4 papers)
Joaquín Ruales (2 papers)
Peter Allen (48 papers)

Citations (294)

View on Semantic Scholar

Summary

The paper introduces a novel 3D CNN architecture for shape completion that significantly improves robotic grasping from partial views.
It utilizes an extensive training set of over 440K synthetic voxel grids and a fast marching cubes algorithm for rapid mesh generation.
Evaluations using metrics like Jaccard similarity and Hausdorff distance confirm enhanced grasp stability, highlighting its potential for autonomous robotics.

An Analysis of "Shape Completion Enabled Robotic Grasping"

This paper introduces an architecture aimed at augmenting robotic grasping capabilities through shape completion. The core approach utilizes a 3D convolutional neural network (CNN) trained to perform shape completion from a singular viewpoint, addressing common challenges in robotic grasping like occlusion and incomplete geometrical data. By generating a comprehensive view of target objects, the proposed system facilitates efficient robotic interaction, encompassing grasp planning and execution.

Overview of Research Methodology

The proposed architecture is implemented in two primary stages: training and runtime. The training phase employs a 3D CNN, leveraging a newly developed open-source dataset of over 440,000 3D exemplars to train the network. These exemplars are based on synthetic depth images representing various objects in different viewpoints. Occupancy grids serve as both input and output data for the CNN during training, with input grids displaying only visible object parts, while output grids provide full object geometry.

At runtime, the system captures a 2.5D point cloud using a depth sensor. The CNN completes the shape, filling occluded areas at a high computational speed due to the preprocessing during training. The system then applies a fast marching cubes algorithm for rapid mesh generation, enabling the planning of stable grasps. Testing in simulation and on physical hardware validated the effectiveness of the approach, confirming a strong correlation between shape completion quality and the utility of the created mesh model for grasping tasks.

Technical Contributions and Results

The contributions of the paper are several-fold:

Novel CNN Architecture: The development of a 3D CNN tailored for rapid shape completion, trained to generate output effectively even for geometrically complex objects.
Extensive Open Source Dataset: A large dataset consisting of over 440,000 voxel grids has been made open-source, which can be instrumental for further research in shape completion.
Fast and Detailed Completion Methods: The methods include both rapid mesh completion for non-grasp purposes and a more detailed mesh generation for precise grasping.

Quantitative evaluation focused on metrics such as Jaccard similarity, Hausdorff distance, and geodesic divergence to measure completion accuracy. The CNN-based approach consistently outperformed baseline methods like mirroring or partial completion. The system exhibited significant improvements in scenarios involving novel objects, highlighting its ability to generalize beyond the trained dataset.

Implications and Future Directions

The implications of this research are substantial for practical robotic applications in unstructured environments. By enhancing the robotic systems' ability to infer occluded parts of objects, the approach improves robustness in real-world grasping tasks. The system's capacity to integrate partial views into a cohesive object representation is particularly beneficial for autonomous robots interacting with diverse object types.

Looking forward, potential avenues for future inquiry could involve the integration of Generative Adversarial Networks (GANs) to refine completion results, leveraging extensive datasets such as ShapeNet for training, and the incorporation of retrieval-based strategies that utilize the completed mesh for identifying pre-existing grasps from a database.

Overall, this work demonstrates the feasibility of employing 3D CNNs in overcoming one of the key hurdles in automated robotic manipulation: grasping with incomplete vision data. As computational resources and datasets continue to expand, methodologies like the one presented in this paper are poised to play an increasingly central role in the evolution of autonomous robotic systems.

PDF Markdown