ASGrasp: Generalizable Transparent Object Reconstruction and 6-DoF Grasp Detection from RGB-D Active Stereo Camera (2405.05648v2)
Abstract: In this paper, we tackle the problem of grasping transparent and specular objects. This issue holds importance, yet it remains unsolved within the field of robotics due to failure of recover their accurate geometry by depth cameras. For the first time, we propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera. ASGrasp utilizes a two-layer learning-based stereo network for the purpose of transparent object reconstruction, enabling material-agnostic object grasping in cluttered environments. In contrast to existing RGB-D based grasp detection methods, which heavily depend on depth restoration networks and the quality of depth maps generated by depth cameras, our system distinguishes itself by its ability to directly utilize raw IR and RGB images for transparent object geometry reconstruction. We create an extensive synthetic dataset through domain randomization, which is based on GraspNet-1Billion. Our experiments demonstrate that ASGrasp can achieve over 90% success rate for generalizable transparent object grasping in both simulation and the real via seamless sim-to-real transfer. Our method significantly outperforms SOTA networks and even surpasses the performance upper bound set by perfect visible point cloud inputs.Project page: https://pku-epic.github.io/ASGrasp
- H.-S. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale benchmark for general object grasping,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11444–11453, 2020.
- M. Breyer, J. J. Chung, L. Ott, R. Siegwart, and J. Nieto, “Volumetric grasping network: Real-time 6 dof grasp detection in clutter,” in Conference on Robot Learning, pp. 1602–1611, PMLR, 2021.
- Z. Jiang, Y. Zhu, M. Svetlik, K. Fang, and Y. Zhu, “Synergies between affordance and geometry: 6-dof grasp detection via implicit representations,” arXiv preprint arXiv:2104.01542, 2021.
- M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13438–13444, IEEE, 2021.
- M. Gou, H.-S. Fang, Z. Zhu, S. Xu, C. Wang, and C. Lu, “Rgb matters: Learning 7-dof grasp poses on monocular rgbd images,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13459–13466, IEEE, 2021.
- C. Wang, H.-S. Fang, M. Gou, H. Fang, J. Gao, and C. Lu, “Graspness discovery in clutters for fast and accurate grasp detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15964–15973, 2021.
- Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, and H. Wang, “Domain randomization-enhanced depth simulation and restoration for perceiving and grasping specular and transparent objects,” in European Conference on Computer Vision, pp. 374–391, Springer, 2022.
- H. Fang, H.-S. Fang, S. Xu, and C. Lu, “Transcg: A large-scale real-world dataset for transparent object depth completion and a grasping baseline,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7383–7390, 2022.
- S. Sajjan, M. Moore, M. Pan, G. Nagaraja, J. Lee, A. Zeng, and S. Song, “Clear grasp: 3d shape estimation of transparent objects for manipulation,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642, IEEE, 2020.
- Q. Dai, Y. Zhu, Y. Geng, C. Ruan, J. Zhang, and H. Wang, “Graspnerf: multiview-based 6-dof grasp detection for transparent and specular objects using generalizable nerf,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 1757–1763, IEEE, 2023.
- J. Kerr, L. Fu, H. Huang, Y. Avigal, M. Tancik, J. Ichnowski, A. Kanazawa, and K. Goldberg, “Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects,” in 6th Annual Conference on Robot Learning, 2022.
- Y. Zhang, S. Khamis, C. Rhemann, J. Valentin, A. Kowdle, V. Tankovich, M. Schoenberg, S. Izadi, T. Funkhouser, and S. Fanello, “Activestereonet: End-to-end self-supervised learning for active stereo systems,” in Proceedings of the european conference on computer vision (ECCV), pp. 784–801, 2018.
- L. Lipson, Z. Teed, and J. Deng, “Raft-stereo: Multilevel recurrent field transforms for stereo matching,” in 2021 International Conference on 3D Vision (3DV), pp. 218–227, IEEE, 2021.
- D. Shin, Z. Ren, E. B. Sudderth, and C. C. Fowlkes, “3d scene reconstruction with multi-layer depth and epipolar transformers,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 2172–2182, 2019.
- K. Konolige, “Projected texture stereo,” in 2010 IEEE International Conference on Robotics and Automation, pp. 148–155, IEEE, 2010.
- I. Liu, E. Yang, J. Tao, R. Chen, X. Zhang, Q. Ran, Z. Liu, and H. Su, “Activezero: Mixed domain learning for active stereovision with zero annotation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13033–13042, 2022.
- L. Keselman, J. Iselin Woodfill, A. Grunnet-Jepsen, and A. Bhowmik, “Intel realsense stereoscopic depth cameras,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1–10, 2017.
- Y. Tang, J. Chen, Z. Yang, Z. Lin, Q. Li, and W. Liu, “Depthgrasp: depth completion of transparent objects using self-attentive adversarial network with spectral residual for grasping,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5710–5716, IEEE, 2021.
- L. Zhu, A. Mousavian, Y. Xiang, H. Mazhar, J. van Eenbergen, S. Debnath, and D. Fox, “Rgb-d local implicit function for depth completion of transparent objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4649–4658, 2021.
- J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven grasp synthesis—a survey,” IEEE Transactions on robotics, vol. 30, no. 2, pp. 289–309, 2013.
- H.-S. Fang, C. Wang, H. Fang, M. Gou, J. Liu, H. Yan, W. Liu, Y. Xie, and C. Lu, “Anygrasp: Robust and efficient grasp perception in spatial and temporal domains,” IEEE Transactions on Robotics, 2023.
- Z. Ma, Z. Teed, and J. Deng, “Multiview stereo with cascaded epipolar raft,” in European Conference on Computer Vision, pp. 734–750, Springer, 2022.
- F. Wang, S. Galliani, C. Vogel, and M. Pollefeys, “Itermvs: Iterative probability estimation for efficient multi-view stereo,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8606–8615, 2022.
- X. Liu, R. Jonschkowski, A. Angelova, and K. Konolige, “Keypose: Multi-view 3d labeling and keypoint estimation for transparent objects,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11602–11610, 2020.
- O. Blender, “Blender—a 3d modelling and rendering package,” Retrieved. represents the sequence of Constructs1 to, vol. 4, 2018.
- E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” 2016.
- J. Park, K. Joo, Z. Hu, C.-K. Liu, and I. So Kweon, “Non-local spatial propagation network for depth completion,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pp. 120–136, Springer, 2020.