Self-supervised 6-DoF Robot Grasping by Demonstration via Augmented Reality Teleoperation System (2404.03067v1)
Abstract: Most existing 6-DoF robot grasping solutions depend on strong supervision on grasp pose to ensure satisfactory performance, which could be laborious and impractical when the robot works in some restricted area. To this end, we propose a self-supervised 6-DoF grasp pose detection framework via an Augmented Reality (AR) teleoperation system that can efficiently learn human demonstrations and provide 6-DoF grasp poses without grasp pose annotations. Specifically, the system collects the human demonstration from the AR environment and contrastively learns the grasping strategy from the demonstration. For the real-world experiment, the proposed system leads to satisfactory grasping abilities and learning to grasp unknown objects within three demonstrations.
- T. Moore, “Robots for nuclear power plants,” IAEA Bulletin, vol. 27, no. 3, pp. 31–38, 1985.
- T. Wang, J. Gao, and O. Xie, “Sliding mode disturbance observer and q learning-based bilateral control for underwater teleoperation systems,” Applied Soft Computing, vol. 130, p. 109684, 2022.
- Z. Wang, H.-K. Lam, B. Xiao, Z. Chen, B. Liang, and T. Zhang, “Event-triggered prescribed-time fuzzy control for space teleoperation systems subject to multiple constraints and uncertainties,” IEEE Transactions on Fuzzy Systems, vol. 29, no. 9, pp. 2785–2797, 2020.
- C. Wang, H.-S. Fang, M. Gou, H. Fang, J. Gao, and C. Lu, “Graspness discovery in clutters for fast and accurate grasp detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 964–15 973.
- X. Dengxiong and Y. Kong, “Ancestor search: Generalized open set recognition via hyperbolic side information learning,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2023, pp. 4003–4012.
- H.-S. Fang, C. Wang, H. Fang, M. Gou, J. Liu, H. Yan, W. Liu, Y. Xie, and C. Lu, “Anygrasp: Robust and efficient grasp perception in spatial and temporal domains,” IEEE Transactions on Robotics, 2023.
- P. Sermanet, K. Xu, and S. Levine, “Unsupervised perceptual rewards for imitation learning,” arXiv preprint arXiv:1612.06699, 2016.
- T. Asfour, P. Azad, F. Gyarfas, and R. Dillmann, “Imitation learning of dual-arm manipulation tasks in humanoid robots,” International journal of humanoid robotics, vol. 5, no. 02, pp. 183–202, 2008.
- B. Wen, W. Lian, K. Bekris, and S. Schaal, “Catgrasp: Learning category-level task-relevant grasping in clutter from simulation,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 6401–6408.
- P. Ni, W. Zhang, X. Zhu, and Q. Cao, “Pointnet++ grasping: Learning an end-to-end spatial grasp generation algorithm from sparse point clouds,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 3619–3625.
- Y. Li, T. Kong, R. Chu, Y. Li, P. Wang, and L. Li, “Simultaneous semantic and collision learning for 6-dof grasp pose estimation,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 3571–3578.
- Y. Jiang, S. Moseson, and A. Saxena, “Efficient grasping from rgbd images: Learning using a new rectangle representation,” in 2011 IEEE International conference on robotics and automation. IEEE, 2011, pp. 3304–3311.
- C. Choi, W. Schwarting, J. DelPreto, and D. Rus, “Learning object grasping for soft robot hands,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2370–2377, 2018.
- C.-H. Wang and P.-C. Lin, “Q-pointnet: Intelligent stacked-objects grasping using a rgbd sensor and a dexterous hand,” in 2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM). IEEE, 2020, pp. 601–606.
- C. Wu, J. Chen, Q. Cao, J. Zhang, Y. Tai, L. Sun, and K. Jia, “Grasp proposal networks: An end-to-end solution for visual learning of robotic grasps,” Advances in Neural Information Processing Systems, vol. 33, pp. 13 174–13 184, 2020.
- M. Gou, H.-S. Fang, Z. Zhu, S. Xu, C. Wang, and C. Lu, “Rgb matters: Learning 7-dof grasp poses on monocular rgbd images,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 459–13 466.
- H.-S. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale benchmark for general object grasping,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 444–11 453.
- J. Guo, C. Liu, and P. Poignet, “A scaled bilateral teleoperation system for robotic-assisted surgery with time delay,” Journal of Intelligent & Robotic Systems, vol. 95, pp. 165–192, 2019.
- J. Liang, B. Wen, K. Bekris, and A. Boularias, “Learning sensorimotor primitives of sequential manipulation tasks from visual demonstrations,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 8591–8597.
- B. Wen, W. Lian, K. Bekris, and S. Schaal, “You only demonstrate once: Category-level manipulation from single visual demonstration,” Proceedings of Robotics: Science and Systems, 2022.
- J. Liang, A. Boularias, A. Dollar, K. Bekris et al., “Vision-driven compliant manipulation for reliable; high-precision assembly tasks,” in Proceedings of Robotics: Science and Systems, 2021.
- S. Baklouti, G. Gallot, J. Viaud, and K. Subrin, “On the improvement of ros-based control for teleoperated yaskawa robots,” Applied Sciences, vol. 11, no. 16, p. 7190, 2021.
- S. N. Young and J. M. Peschel, “Review of human–machine interfaces for small unmanned systems with robotic manipulators,” IEEE Transactions on Human-Machine Systems, vol. 50, no. 2, pp. 131–143, 2020.
- P. Gliesche, T. Krick, M. Pfingsthorn, S. Drolshagen, C. Kowalski, and A. Hein, “Kinesthetic device vs. keyboard/mouse: a comparison in home care telemanipulation,” Frontiers in Robotics and AI, vol. 7, p. 561015, 2020.
- J. P. Clark, G. Lentini, F. Barontini, M. G. Catalano, M. Bianchi, and M. K. O’Malley, “On the role of wearable haptics for force feedback in teleimpedance control for dual-arm robotic teleoperation,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 5187–5193.
- M. Aggravi, D. A. Estima, A. Krupa, S. Misra, and C. Pacchierotti, “Haptic teleoperation of flexible needles combining 3d ultrasound guidance and needle tip force feedback,” IEEE Robotics and automation letters, vol. 6, no. 3, pp. 4859–4866, 2021.
- C. Mizera, T. Delrieu, V. Weistroffer, C. Andriot, A. Decatoire, and J.-P. Gazeau, “Evaluation of hand-tracking systems in teleoperation and virtual dexterous manipulation,” IEEE Sensors Journal, vol. 20, no. 3, pp. 1642–1655, 2019.
- A. Handa, K. Van Wyk, W. Yang, J. Liang, Y.-W. Chao, Q. Wan, S. Birchfield, N. Ratliff, and D. Fox, “Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 9164–9170.
- A. Sivakumar, K. Shaw, and D. Pathak, “Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube,” arXiv preprint arXiv:2202.10448, 2022.
- Y. Pan, C. Chen, D. Li, Z. Zhao, and J. Hong, “Augmented reality-based robot teleoperation system using rgb-d imaging and attitude teaching device,” Robotics and Computer-Integrated Manufacturing, vol. 71, p. 102167, 2021.
- Y. Qin, W. Yang, B. Huang, K. Van Wyk, H. Su, X. Wang, Y.-W. Chao, and D. Fox, “Anyteleop: A general vision-based dexterous robot arm-hand teleoperation system,” in Robotics: Science and Systems, 2023.
- S. P. Arunachalam, I. Güzey, S. Chintala, and L. Pinto, “Holo-dex: Teaching dexterity with immersive mixed reality,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5962–5969.
- C. Wang and A. Belardinelli, “Investigating explainable human-robot interaction with augmented reality,” in 5th International Workshop on Virtual, Augmented, and Mixed Reality for HRI.
- F. Xiang, Y. Qin, K. Mo, Y. Xia, H. Zhu, F. Liu, M. Liu, H. Jiang, Y. Yuan, H. Wang et al., “Sapien: A simulated part-based interactive environment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 097–11 107.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033.
- C. Gan, J. Schwartz, S. Alter, D. Mrowca, M. Schrimpf, J. Traer, J. De Freitas, J. Kubilius, A. Bhandwaldar, N. Haber et al., “Threedworld: A platform for interactive multi-modal physical simulation,” arXiv preprint arXiv:2007.04954, 2020.
- C. Li, P. Zheng, S. Li, Y. Pang, and C. K. Lee, “Ar-assisted digital twin-enabled robot collaborative manufacturing system with human-in-the-loop,” Robotics and Computer-Integrated Manufacturing, vol. 76, p. 102321, 2022.
- S. Calinon and A. Billard, “Incremental learning of gestures by imitation in a humanoid robot,” in Proceedings of the ACM/IEEE international conference on Human-robot interaction, 2007, pp. 255–262.
- S. Qiu, H. Liu, Z. Zhang, Y. Zhu, and S.-C. Zhu, “Human-robot interaction in a shared augmented reality workspace,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 11 413–11 418.
- L. Cao, H. Zhang, C. Peng, and J. T. Hansberger, “Real-time multimodal interaction in virtual reality-a case study with a large virtual interface,” Multimedia Tools and Applications, pp. 1–22, 2023.
- H. Zhang, L. Cao, G. Howell, D. Schwartz, and C. Peng, “An educational virtual reality game for learning historical events,” Virtual Reality, vol. 27, no. 4, pp. 2895–2909, 2023.
- U. Technologies, “Ros tcp connector,” 2022, accessed: september 1st, 2023. [Online]. Available: https://github.com/Unity-Technologies/ROS-TCP-Connector
- X. Zhao, W. Ding, Y. An, Y. Du, T. Yu, M. Li, M. Tang, and J. Wang, “Fast segment anything,” 2023.
- R. Newbury, M. Gu, L. Chumbley, A. Mousavian, C. Eppner, J. Leitner, J. Bohg, A. Morales, T. Asfour, D. Kragic et al., “Deep learning approaches to grasp synthesis: A review,” IEEE Transactions on Robotics, 2023.
- M. Breyer, J. J. Chung, L. Ott, R. Siegwart, and J. Nieto, “Volumetric grasping network: Real-time 6 dof grasp detection in clutter,” in Conference on Robot Learning. PMLR, 2021, pp. 1602–1611.
- B. Zhao, H. Zhang, X. Lan, H. Wang, Z. Tian, and N. Zheng, “Regnet: Region-based grasp network for end-to-end grasp detection in point clouds,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 474–13 480.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
- Y. Qin, R. Chen, H. Zhu, M. Song, J. Xu, and H. Su, “S4g: Amodal single-view single-shot se (3) grasp detection in cluttered scenes,” in Conference on robot learning. PMLR, 2020, pp. 53–65.
- B. Calli, A. Walsman, A. Singh, S. Srinivasa, P. Abbeel, and A. M. Dollar, “Benchmarking in manipulation research: The ycb object and model set and benchmarking protocols,” arXiv preprint arXiv:1502.03143, 2015.
- J. Mahler, M. Matl, V. Satish, M. Danielczuk, B. DeRose, S. McKinley, and K. Goldberg, “Learning ambidextrous robot grasping policies,” Science Robotics, vol. 4, no. 26, p. eaau4984, 2019.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.