Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation (2403.11511v1)
Abstract: This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem. In this case, we present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment. First, a self-supervised rotation pre-training strategy is adopted to deliver robust initialization for RGB and depth networks. We then propose a global-to-local alignment pipeline with individual global domain classifiers for scene features of RGB and depth images as well as a local one specifically working for grasp features in the two modalities. In particular, we propose a grasp prototype adaptation module, which aims to facilitate fine-grained local feature alignment by dynamically updating and matching the grasp prototypes from the simulation and real-world scenarios throughout the training process. Due to such designs, the proposed method substantially reduces the domain shift and thus leads to consistent performance improvements. Extensive experiments are conducted on the GraspNet-Planar benchmark and physical environment, and superior results are achieved which demonstrate the effectiveness of our method.
- L. Pinto and A. Gupta, “Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,” in IEEE International Conference on Robotics and Automation, ICRA, 2016.
- S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” International Journal of Robotics Research, IJRR, 2018.
- A. Depierre, E. Dellandréa, and L. Chen, “Jacquard: A large scale dataset for robotic grasp detection,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2018.
- H. Zhang, D. Yang, H. Wang, B. Zhao, X. Lan, J. Ding, and N. Zheng, “REGRAD: A large-scale relational grasp dataset for safe and object-specific robotic grasping in clutter,” IEEE Robotics and Automation Letters, RA-L, 2022.
- C. Eppner, A. Mousavian, and D. Fox, “ACRONYM: A large-scale grasp dataset based on simulation,” in IEEE International Conference on Robotics and Automation, ICRA, 2021.
- J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2017.
- J. Tobin, L. Biewald, R. Duan, M. Andrychowicz, A. Handa, V. Kumar, B. McGrew, A. Ray, J. Schneider, P. Welinder, W. Zaremba, and P. Abbeel, “Domain randomization and generative models for robotic grasping,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2018.
- R. Alghonaim and E. Johns, “Benchmarking domain randomisation for visual sim-to-real transfer,” in IEEE International Conference on Robotics and Automation, ICRA, 2021.
- K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, and V. Vanhoucke, “Using simulation and domain adaptation to improve efficiency of deep robotic grasping,” in IEEE International Conference on Robotics and Automation, ICRA, 2018.
- K. Fang, Y. Bai, S. Hinterstoisser, S. Savarese, and M. Kalakrishnan, “Multi-task domain adaptation for deep learning of instance grasping from simulation,” in IEEE International Conference on Robotics and Automation, ICRA, 2018.
- H. Zhu, Y. Li, F. Bai, W. Chen, X. Li, J. Ma, C. S. Teo, P. Y. Tao, and W. Lin, “Grasping detection network with uncertainty estimation for confidence-driven semi-supervised domain adaptation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2020.
- A. Saxena, J. Driemeyer, J. Kearns, and A. Y. Ng, “Robotic grasping of novel objects,” in Advances in Neural Information Processing Systems, NIPS, 2006.
- S. James, P. Wohlhart, M. Kalakrishnan, D. Kalashnikov, A. Irpan, J. Ibarz, S. Levine, R. Hadsell, and K. Bousmalis, “Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019.
- R. Qin, H. Ma, B. Gao, and D. Huang, “RGB-D grasp detection via depth guided learning with cross-modal attention,” in IEEE International Conference on Robotics and Automation, ICRA, 2023.
- M. Gou, H. Fang, Z. Zhu, S. Xu, C. Wang, and C. Lu, “RGB matters: Learning 7-dof grasp poses on monocular RGBD images,” in IEEE International Conference on Robotics and Automation, ICRA, 2021.
- H. Ma and D. Huang, “Towards scale balanced 6-dof grasp detection in cluttered scenes,” in Conference on Robot Learning, CoRL, 2022.
- F. Chu, R. Xu, and P. A. Vela, “Real-world multiobject, multigrasp detection,” IEEE Robotics and Automation Letters, RA-L, 2018.
- H. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale benchmark for general object grasping,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. S. Lempitsky, “Domain-adversarial training of neural networks,” Journal of Machine Learning Research, JMLR, 2016.
- Y. Zheng, D. Huang, S. Liu, and Y. Wang, “Cross-domain object detection through coarse-to-fine feature adaptation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020.
- E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
- X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in IEEE International Conference on Robotics and Automation, ICRA, 2018.
- J. So, A. Xie, S. Jung, J. A. Edlund, R. Thakker, A. Agha-mohammadi, P. Abbeel, and S. James, “Sim-to-real via sim-to-seg: End-to-end off-road autonomous driving without real data,” in Conference on Robot Learning, CoRL, 2022.
- J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, 2018.
- D. Horváth, G. Erdös, Z. Istenes, T. Horváth, and S. Földi, “Object detection using sim2real domain randomization for robotic applications,” IEEE Transactions on Robotics, T-RO, 2023.
- A. Pashevich, R. Strudel, I. Kalevatykh, I. Laptev, and C. Schmid, “Learning to augment synthetic images for sim2real policy transfer,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2019.
- K. Kleeberger, M. Völk, M. Moosmann, E. Thiessenhusen, F. Roth, R. Bormann, and M. F. Huber, “Transferring experience from simulation to the real world for precise pick-and-place tasks in highly cluttered scenes,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2020.
- Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. V. Gool, “Domain adaptive faster R-CNN for object detection in the wild,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018.
- K. Saito, Y. Ushiku, T. Harada, and K. Saenko, “Strong-weak distribution alignment for adaptive object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019.
- V. VS, V. Gupta, P. Oza, V. A. Sindagi, and V. M. Patel, “Mega-cda: Memory guided attention for category-aware unsupervised domain adaptive object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2021.
- L. Zhao and L. Wang, “Task-specific inconsistency alignment for domain adaptive object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022.
- Q. Tian, H. Sun, S. Peng, and T. Ma, “Self-adaptive label filtering learning for unsupervised domain adaptation,” Frontiers of Computer Science, FCS, 2023.
- S. Liu, X. Luo, K. Fu, M. Wang, and Z. Song, “A learnable self-supervised task for unsupervised domain adaptation on point cloud classification and segmentation,” Frontiers of Computer Science, FCS, 2023.
- H. Hsu, C. Yao, Y. Tsai, W. Hung, H. Tseng, M. K. Singh, and M. Yang, “Progressive domain adaptation for object detection,” in IEEE Winter Conference on Applications of Computer Vision, WACV, 2020.
- A. L. Rodriguez and K. Mikolajczyk, “Domain adaptation for object detection via style consistency,” in British Machine Vision Conference, BMVC, 2019.
- X. Zhang, R. Chen, A. Li, F. Xiang, Y. Qin, J. Gu, Z. Ling, M. Liu, P. Zeng, S. Han, Z. Huang, T. Mu, J. Xu, and H. Su, “Close the optical sensing domain gap by physics-grounded active stereo sensor simulation,” IEEE Transactions on Robotics, T-RO, 2023.
- F. Xiang, Y. Qin, K. Mo, Y. Xia, H. Zhu, F. Liu, M. Liu, H. Jiang, Y. Yuan, H. Wang, L. Yi, A. X. Chang, L. J. Guibas, and H. Su, “SAPIEN: A simulated part-based interactive environment,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020.
- S. Iqbal, J. Tremblay, A. Campbell, K. Leung, T. To, J. Cheng, E. Leitch, D. McKay, and S. Birchfield, “Toward sim-to-real directional semantic grasping,” in IEEE International Conference on Robotics and Automation, ICRA, 2020.
- X. Li, R. Cao, Y. Feng, K. Chen, B. Yang, C. Fu, Y. Li, Q. Dou, Y. Liu, and P. Heng, “A sim-to-real object recognition and localization framework for industrial robotic bin picking,” IEEE Robotics and Automation Letters, RA-L, 2022.
- Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, and H. Wang, “Domain randomization-enhanced depth simulation and restoration for perceiving and grasping specular and transparent objects,” in European Conference on Computer Vision, ECCV, 2022.
- M. R. Loghmani, L. Robbiano, M. Planamente, K. Park, B. Caputo, and M. Vincze, “Unsupervised domain adaptation through inter-modal rotation for RGB-D object recognition,” IEEE Robotics and Automation Letters, RA-L, 2020.
- Y. Ganin and V. S. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in International Conference on Machine Learning, ICML, 2015.
- S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in International Conference on Learning Representations, ICLR, 2018.
- J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in IEEE International Conference on Computer Vision, ICCV, 2017.
- B. Çalli, A. Singh, J. Bruce, A. Walsman, K. Konolige, S. S. Srinivasa, P. Abbeel, and A. M. Dollar, “Yale-cmu-berkeley dataset for robotic manipulation research,” International Journal of Robotics Research, IJRR, 2017.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.