Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GraNet: A Multi-Level Graph Network for 6-DoF Grasp Pose Generation in Cluttered Scenes (2312.03345v1)

Published 6 Dec 2023 in cs.RO and cs.CV

Abstract: 6-DoF object-agnostic grasping in unstructured environments is a critical yet challenging task in robotics. Most current works use non-optimized approaches to sample grasp locations and learn spatial features without concerning the grasping task. This paper proposes GraNet, a graph-based grasp pose generation framework that translates a point cloud scene into multi-level graphs and propagates features through graph neural networks. By building graphs at the scene level, object level, and grasp point level, GraNet enhances feature embedding at multiple scales while progressively converging to the ideal grasping locations by learning. Our pipeline can thus characterize the spatial distribution of grasps in cluttered scenes, leading to a higher rate of effective grasping. Furthermore, we enhance the representation ability of scalable graph networks by a structure-aware attention mechanism to exploit local relations in graphs. Our method achieves state-of-the-art performance on the large-scale GraspNet-1Billion benchmark, especially in grasping unseen objects (+11.62 AP). The real robot experiment shows a high success rate in grasping scattered objects, verifying the effectiveness of the proposed approach in unstructured environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. A. Zeng, K. T. Yu, S. Song, D. Suo, E. Walker, A. Rodriguez, and J. Xiao, “Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge,” in IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 1386–1383.
  2. C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, and S. Savarese, “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3338–3347.
  3. L. Pinto and A. Gupta, “Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours,” in IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 3406–3413.
  4. S. Kumra and C. Kanan, “Robotic grasp detection using deep convolutional neural networks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 769–776.
  5. A. Mousavian, C. Eppner, and D. Fox, “6-DOF GraspNet: Variational Grasp Generation for Object Manipulation,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 2901–2910.
  6. A. Murali, A. Mousavian, C. Eppner, C. Paxton, and D. Fox, “6-DOF Grasping for Target-driven Object Manipulation in Clutter,” in IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 6232–6238.
  7. X. Lou, Y. Yang, and C. Choi, “Learning to Generate 6-DoF Grasp Poses with Reachability Awareness,” in IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 1532–1538.
  8. H. Liang, X. Ma, S. Li, M. Görner, S. Tang, B. Fang, F. Sun, and J. Zhang, “PointNetGPD: Detecting Grasp Configurations from Point Sets,” in IEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 3629–3635.
  9. P. Ni, W. Zhang, X. Zhu, and Q. Cao, “PointNet++ Grasping: Learning An End-to-end Spatial Grasp Generation Algorithm from Sparse Point Clouds,” in IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 3619–3625.
  10. Y. Qin, R. Chen, H. Zhu, M. Song, J. Xu, and H. Su, “S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes,” in Conference on Robot Learning (CoRL), 2020, pp. 53–65.
  11. C. Wang, H.-S. Fang, M. Gou, H. Fang, J. Gao, and C. Lu, “Graspness Discovery in Clutters for Fast and Accurate Grasp Detection,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 964–15 973.
  12. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5099–5108.
  13. H.-S. Fang, C. Wang, M. Gou, and C. Lu, “GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 444–11 453.
  14. J. M. Wong, V. Kee, T. Le, S. Wagner, G.-L. Mariottini, A. Schneider, L. Hamilton, R. Chipalkatty, M. Hebert, D. M. Johnson, J. Wu, B. Zhou, and A. Torralba, “SegICP: Integrated deep semantic segmentation and pose estimation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 5784–5789.
  15. B. Tekin, S. N. Sinha, and P. Fua, “Real-Time Seamless Single Shot 6D Object Pose Prediction,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 292–301.
  16. Z. Dong, S. Liu, T. Zhou, H. Cheng, L. Zeng, X. Yu, and H. Liu, “PPR-Net:Point-wise Pose Regression Network for Instance Segmentation and 6D Pose Estimation in Bin-picking Scenarios,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 1773–1780.
  17. M. Gou, H.-S. Fang, Z. Zhu, S. Xu, C. Wang, and C. Lu, “RGB Matters: Learning 7-DoF Grasp Poses on Monocular RGBD Images,” in IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 13 459–13 466.
  18. B. Wen, W. Lian, K. Bekris, and S. Schaal, “CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation,” in International Conference on Robotics and Automation (ICRA), 2022, pp. 6401–6408.
  19. Y. Jiang, S. Moseson, and A. Saxena, “Efficient grasping from RGBD images: Learning using a new rectangle representation,” in IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 3304–3311.
  20. J. Redmon and A. Angelova, “Real-time grasp detection using convolutional neural networks,” in IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 1316–1322.
  21. S. Wang, X. Jiang, J. Zhao, X. Wang, W. Zhou, and Y. Liu, “Efficient Fully Convolution Neural Network for Generating Pixel Wise Robotic Grasps With High Resolution Images,” in IEEE International Conference on Robotics and Biomimetics (ROBIO), 2019, pp. 474–480.
  22. S. Kumra, S. Joshi, and F. Sahin, “Antipodal Robotic Grasping using Generative Residual Convolutional Neural Network,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 9626–9633.
  23. A. Pas, M. Gualtieri, K. Saenko, and R. Platt, “Grasp Pose Detection in Point Clouds,” The International Journal of Robotics Research (IJRR), vol. 36, pp. 1455–1473, 2017.
  24. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 652–660.
  25. M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes,” in IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 13 438–13 444.
  26. A. Alliegro, M. Rudorfer, F. Frattin, A. Leonardis, and T. Tommasi, “End-to-End Learning to Grasp via Sampling From Object Point Clouds,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9865–9872, 2022.
  27. B. Zhao, H. Zhang, X. Lan, H. Wang, Z. Tian, and N. Zheng, “REGNet: REgion-based Grasp Network for End-to-end Grasp Detection in Point Clouds,” in IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 13 474–13 480.
  28. Y. Lu, B. Deng, Z. Wang, P. Zhi, Y. Li, and S. Wang, “Hybrid Physical Metric For 6-DoF Grasp Pose Detection,” in IEEE International Conference on Robotics and Automation (ICRA), 2022, pp. 8238–8244.
  29. L. Tian, J. Wu, Z. Xiong, and X. Zhu, “Vote for Grasp Poses from Noisy Point Sets by Learning From Human,” in International Conference on Mechatronics and Machine Vision in Practice (M2VIP), 2021, pp. 349–356.
  30. Y. Li, T. Kong, R. Chu, Y. Li, P. Wang, and L. Li, “Simultaneous Semantic and Collision Learning for 6-DoF Grasp Pose Estimation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3571–3578.
  31. Z. Liu, Z. Chen, S. Xie, and W. Zheng, “TransGrasp: A Multi-Scale Hierarchical Point Transformer for 7-DoF Grasp Detection,” in IEEE International Conference on Robotics and Automation (ICRA), 2022, pp. 1533–1539.
  32. Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A Comprehensive Survey on Graph Neural Networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021.
  33. L. Yao, C. Mao, and Y. Luo, “Graph Convolutional Networks for Text Classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 7370–7377.
  34. T. Bian, X. Xiao, T. Xu, P. Zhao, W. Huang, Y. Rong, and J. Huang, “Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 549–556.
  35. A. Garcia-Garcia, B. S. Zapata-Impata, S. Orts-Escolano, P. Gil, and J. Garcia-Rodriguez, “TactileGCN: A Graph Convolutional Network for Predicting Grasp Stability with Tactile Sensors,” in International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–8.
  36. S. Funabashi, T. Isobe, F. Hongyi, A. Hiramoto, A. Schmitz, S. Sugano, and T. Ogata, “Multi-Fingered In-Hand Manipulation With Various Object Properties Using Graph Convolutional Networks and Distributed Tactile Sensors,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2102–2109, 2022.
  37. A. Iriondo, E. Lazkano, and A. Ansuategi, “Affordance-Based Grasping Point Detection Using Graph Convolutional Networks for Industrial Bin-Picking Applications,” Sensors, vol. 21, no. 3, p. 816, 2021.
  38. X. Lou, Y. Yang, and C. Choi, “Learning Object Relations with Graph Neural Networks for Target-Driven Grasping in Dense Clutter,” in IEEE International Conference on Robotics and Automation (ICRA), 2022, pp. 742–748.
  39. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic Graph CNN for Learning on Point Clouds,” ACM Transactions on Graphics, vol. 38, no. 5, pp. 146:1–146:12, 2019.
  40. F. Frasca, E. Rossi, D. Eynard, B. Chamberlain, M. Bronstein, and F. Monti, “SIGN: Scalable Inception Graph Neural Networks,” in ICML 2020 Workshop on Graph Representation Learning and Beyond, 2020.
  41. Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated Graph Sequence Neural Networks,” arXiv preprint arXiv:1511.05493, 2017.
  42. D. Morrison, J. Leitner, and P. Corke, “Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach,” in Robotics: Science and Systems (RSS), 2018.
  43. F. J. Chu, R. Xu, and P. A. Vela, “Real-World Multiobject, Multigrasp Detection,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3355–3362, 2018.
Citations (4)

Summary

  • The paper introduces a multi-level graph network that significantly enhances 6-DoF grasp pose generation in complex, cluttered environments.
  • It employs graph neural networks on 3D point clouds to capture spatial features across scene, object, and grasp levels.
  • The paper validates GraNet with extensive experiments on the GraspNet-1Billion dataset and real robot trials, demonstrating state-of-the-art grasp success rates.

Introduction to GraNet

The development of robotic systems capable of grasping objects in cluttered environments with high degrees of freedom (6-DoF) remains a significant challenge in the field of robotics. Many existing methodologies fall short in terms of either adaptability or generalization, particularly when dealing with unknown objects and complex scenarios.

Graph Network Approach

A paper introduces GraNet, a multi-level graph network aiming to advance the current state of 6-DoF grasp pose generation. This innovative approach operates on point clouds, 3D representations of a scene, to determine optimal grasping poses for robotic manipulators. By constructing multi-level graphs that encapsulate scene, object, and grasp levels, GraNet can focus on ideal grasping locations through feature propagation using graph neural networks (GNNs). The hierarchical structure with cascading learning effects significantly improves the spatial feature recognition, leading to a noticeable increase in effective grasping rates in cluttered scenarios. Graph-based strategies are advantageous because they focus on the relationships between data points, thus enhancing the understanding of geometric structures crucial for grasp prediction.

GraspNet-1Billion and Performance

The effectiveness of GraNet is demonstrated on the extensive GraspNet-1Billion dataset, which contains over a billion annotated grasp poses. When put to the test, the model showcases state-of-the-art performance, particularly when identifying grasping opportunities for previously unseen objects. The network structure utilizes local attentiveness in conjunction with multi-hop connectivity, emphasizing the importance of nearby multi-hop information, and reducing redundancy by avoiding non-grasping locations.

Implementation and Real Robot Experimentation

The implementation details of the network reveal its complexity, incorporating several components such as graph feature embedding networks and learning-based grasp point selection mechanisms. The experimentation extends beyond simulations to real-world robotic trials, where GraNet was tasked with grasping novel items. The success rates reported underscore the network's robustness and its potential for practical applications in automated assembly lines, warehouse sorting, and other scenarios where robots interact with a diverse range of objects in disordered states.

Conclusion

The paper concludes by affirming that GraNet's multi-level graph-based network can effectively interpret complex scenes and select the most appropriate grasping points without prior knowledge of the objects. This marks a significant step forward in robotic grasping technology, paving the way for more efficient and adaptable autonomous systems. Provided with additional support and development, such systems could revolutionize various industries, enhancing product handling, and delivery processes. The integration of the grasping task into the feature extraction network emerges as a particularly powerful tactic, reflecting a promising direction for future research in robotic manipulation.