Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 70 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 37 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 21 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation (2403.11511v1)

Published 18 Mar 2024 in cs.RO and cs.CV

Abstract: This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem. In this case, we present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment. First, a self-supervised rotation pre-training strategy is adopted to deliver robust initialization for RGB and depth networks. We then propose a global-to-local alignment pipeline with individual global domain classifiers for scene features of RGB and depth images as well as a local one specifically working for grasp features in the two modalities. In particular, we propose a grasp prototype adaptation module, which aims to facilitate fine-grained local feature alignment by dynamically updating and matching the grasp prototypes from the simulation and real-world scenarios throughout the training process. Due to such designs, the proposed method substantially reduces the domain shift and thus leads to consistent performance improvements. Extensive experiments are conducted on the GraspNet-Planar benchmark and physical environment, and superior results are achieved which demonstrate the effectiveness of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. L. Pinto and A. Gupta, “Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,” in IEEE International Conference on Robotics and Automation, ICRA, 2016.
  2. S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” International Journal of Robotics Research, IJRR, 2018.
  3. A. Depierre, E. Dellandréa, and L. Chen, “Jacquard: A large scale dataset for robotic grasp detection,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2018.
  4. H. Zhang, D. Yang, H. Wang, B. Zhao, X. Lan, J. Ding, and N. Zheng, “REGRAD: A large-scale relational grasp dataset for safe and object-specific robotic grasping in clutter,” IEEE Robotics and Automation Letters, RA-L, 2022.
  5. C. Eppner, A. Mousavian, and D. Fox, “ACRONYM: A large-scale grasp dataset based on simulation,” in IEEE International Conference on Robotics and Automation, ICRA, 2021.
  6. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2017.
  7. J. Tobin, L. Biewald, R. Duan, M. Andrychowicz, A. Handa, V. Kumar, B. McGrew, A. Ray, J. Schneider, P. Welinder, W. Zaremba, and P. Abbeel, “Domain randomization and generative models for robotic grasping,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2018.
  8. R. Alghonaim and E. Johns, “Benchmarking domain randomisation for visual sim-to-real transfer,” in IEEE International Conference on Robotics and Automation, ICRA, 2021.
  9. K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, and V. Vanhoucke, “Using simulation and domain adaptation to improve efficiency of deep robotic grasping,” in IEEE International Conference on Robotics and Automation, ICRA, 2018.
  10. K. Fang, Y. Bai, S. Hinterstoisser, S. Savarese, and M. Kalakrishnan, “Multi-task domain adaptation for deep learning of instance grasping from simulation,” in IEEE International Conference on Robotics and Automation, ICRA, 2018.
  11. H. Zhu, Y. Li, F. Bai, W. Chen, X. Li, J. Ma, C. S. Teo, P. Y. Tao, and W. Lin, “Grasping detection network with uncertainty estimation for confidence-driven semi-supervised domain adaptation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2020.
  12. A. Saxena, J. Driemeyer, J. Kearns, and A. Y. Ng, “Robotic grasping of novel objects,” in Advances in Neural Information Processing Systems, NIPS, 2006.
  13. S. James, P. Wohlhart, M. Kalakrishnan, D. Kalashnikov, A. Irpan, J. Ibarz, S. Levine, R. Hadsell, and K. Bousmalis, “Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019.
  14. R. Qin, H. Ma, B. Gao, and D. Huang, “RGB-D grasp detection via depth guided learning with cross-modal attention,” in IEEE International Conference on Robotics and Automation, ICRA, 2023.
  15. M. Gou, H. Fang, Z. Zhu, S. Xu, C. Wang, and C. Lu, “RGB matters: Learning 7-dof grasp poses on monocular RGBD images,” in IEEE International Conference on Robotics and Automation, ICRA, 2021.
  16. H. Ma and D. Huang, “Towards scale balanced 6-dof grasp detection in cluttered scenes,” in Conference on Robot Learning, CoRL, 2022.
  17. F. Chu, R. Xu, and P. A. Vela, “Real-world multiobject, multigrasp detection,” IEEE Robotics and Automation Letters, RA-L, 2018.
  18. H. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale benchmark for general object grasping,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020.
  19. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. S. Lempitsky, “Domain-adversarial training of neural networks,” Journal of Machine Learning Research, JMLR, 2016.
  20. Y. Zheng, D. Huang, S. Liu, and Y. Wang, “Cross-domain object detection through coarse-to-fine feature adaptation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020.
  21. E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
  22. X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in IEEE International Conference on Robotics and Automation, ICRA, 2018.
  23. J. So, A. Xie, S. Jung, J. A. Edlund, R. Thakker, A. Agha-mohammadi, P. Abbeel, and S. James, “Sim-to-real via sim-to-seg: End-to-end off-road autonomous driving without real data,” in Conference on Robot Learning, CoRL, 2022.
  24. J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, 2018.
  25. D. Horváth, G. Erdös, Z. Istenes, T. Horváth, and S. Földi, “Object detection using sim2real domain randomization for robotic applications,” IEEE Transactions on Robotics, T-RO, 2023.
  26. A. Pashevich, R. Strudel, I. Kalevatykh, I. Laptev, and C. Schmid, “Learning to augment synthetic images for sim2real policy transfer,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2019.
  27. K. Kleeberger, M. Völk, M. Moosmann, E. Thiessenhusen, F. Roth, R. Bormann, and M. F. Huber, “Transferring experience from simulation to the real world for precise pick-and-place tasks in highly cluttered scenes,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2020.
  28. Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. V. Gool, “Domain adaptive faster R-CNN for object detection in the wild,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018.
  29. K. Saito, Y. Ushiku, T. Harada, and K. Saenko, “Strong-weak distribution alignment for adaptive object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019.
  30. V. VS, V. Gupta, P. Oza, V. A. Sindagi, and V. M. Patel, “Mega-cda: Memory guided attention for category-aware unsupervised domain adaptive object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2021.
  31. L. Zhao and L. Wang, “Task-specific inconsistency alignment for domain adaptive object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022.
  32. Q. Tian, H. Sun, S. Peng, and T. Ma, “Self-adaptive label filtering learning for unsupervised domain adaptation,” Frontiers of Computer Science, FCS, 2023.
  33. S. Liu, X. Luo, K. Fu, M. Wang, and Z. Song, “A learnable self-supervised task for unsupervised domain adaptation on point cloud classification and segmentation,” Frontiers of Computer Science, FCS, 2023.
  34. H. Hsu, C. Yao, Y. Tsai, W. Hung, H. Tseng, M. K. Singh, and M. Yang, “Progressive domain adaptation for object detection,” in IEEE Winter Conference on Applications of Computer Vision, WACV, 2020.
  35. A. L. Rodriguez and K. Mikolajczyk, “Domain adaptation for object detection via style consistency,” in British Machine Vision Conference, BMVC, 2019.
  36. X. Zhang, R. Chen, A. Li, F. Xiang, Y. Qin, J. Gu, Z. Ling, M. Liu, P. Zeng, S. Han, Z. Huang, T. Mu, J. Xu, and H. Su, “Close the optical sensing domain gap by physics-grounded active stereo sensor simulation,” IEEE Transactions on Robotics, T-RO, 2023.
  37. F. Xiang, Y. Qin, K. Mo, Y. Xia, H. Zhu, F. Liu, M. Liu, H. Jiang, Y. Yuan, H. Wang, L. Yi, A. X. Chang, L. J. Guibas, and H. Su, “SAPIEN: A simulated part-based interactive environment,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020.
  38. S. Iqbal, J. Tremblay, A. Campbell, K. Leung, T. To, J. Cheng, E. Leitch, D. McKay, and S. Birchfield, “Toward sim-to-real directional semantic grasping,” in IEEE International Conference on Robotics and Automation, ICRA, 2020.
  39. X. Li, R. Cao, Y. Feng, K. Chen, B. Yang, C. Fu, Y. Li, Q. Dou, Y. Liu, and P. Heng, “A sim-to-real object recognition and localization framework for industrial robotic bin picking,” IEEE Robotics and Automation Letters, RA-L, 2022.
  40. Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, and H. Wang, “Domain randomization-enhanced depth simulation and restoration for perceiving and grasping specular and transparent objects,” in European Conference on Computer Vision, ECCV, 2022.
  41. M. R. Loghmani, L. Robbiano, M. Planamente, K. Park, B. Caputo, and M. Vincze, “Unsupervised domain adaptation through inter-modal rotation for RGB-D object recognition,” IEEE Robotics and Automation Letters, RA-L, 2020.
  42. Y. Ganin and V. S. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in International Conference on Machine Learning, ICML, 2015.
  43. S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in International Conference on Learning Representations, ICLR, 2018.
  44. J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in IEEE International Conference on Computer Vision, ICCV, 2017.
  45. B. Çalli, A. Singh, J. Bruce, A. Walsman, K. Konolige, S. S. Srinivasa, P. Abbeel, and A. M. Dollar, “Yale-cmu-berkeley dataset for robotic manipulation research,” International Journal of Robotics Research, IJRR, 2017.
  46. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube