Adapting Skills to Novel Grasps: A Self-Supervised Approach (2408.00178v1)
Abstract: In this paper, we study the problem of adapting manipulation trajectories involving grasped objects (e.g. tools) defined for a single grasp pose to novel grasp poses. A common approach to address this is to define a new trajectory for each possible grasp explicitly, but this is highly inefficient. Instead, we propose a method to adapt such trajectories directly while only requiring a period of self-supervised data collection, during which a camera observes the robot's end-effector moving with the object rigidly grasped. Importantly, our method requires no prior knowledge of the grasped object (such as a 3D CAD model), it can work with RGB images, depth images, or both, and it requires no camera calibration. Through a series of real-world experiments involving 1360 evaluations, we find that self-supervised RGB data consistently outperforms alternatives that rely on depth images including several state-of-the-art pose estimation methods. Compared to the best-performing baseline, our method results in an average of 28.5% higher success rate when adapting manipulation trajectories to novel grasps on several everyday tasks. Videos of the experiments are available on our webpage at https://www.robot-learning.uk/adapting-skills
- E. Valassakis et al., “Demonstrate once, imitate immediately (dome): Learning visual servoing for one-shot imitation learning,” IROS, 2022.
- W. Wan, H. Igawa, K. Harada, H. Onda, K. Nagata, and N. Yamanobe, “A regrasp planning component for object reorientation,” Auton. Robots, vol. 43, p. 1101–1115, jun 2019.
- A. Nguyen et al., “Preparatory object reorientation for task-oriented grasping,” in IROS, 2016.
- S. Cheng, K. Mo, and L. Shao, “Learning to regrasp by learning to place,” CoRR, vol. abs/2109.08817, 2021.
- A. Simeonov, Y. Du, A. Tagliasacchi, J. B. Tenenbaum, A. Rodriguez, P. Agrawal, and V. Sitzmann, “Neural descriptor fields: Se(3)-equivariant object representations for manipulation,” ICRA, 2022.
- H. Chen et al., “Aspanformer: Detector-free image matching with adaptive span transformer,” ECCV, 2022.
- S. Rusinkiewicz et al., “Efficient variants of the icp algorithm,” 3rd Intl. Conf. on 3D Digital Imaging and Modeling, 2001.
- S. Amir et al., “Deep vit features as dense visual descriptors,” ECCVW What is Motion For?, 2022.
- P. R. Florence, L. Manuelli, and R. Tedrake, “Dense object nets: Learning dense visual object descriptors by and for robotic manipulation,” arXiv preprint arXiv:1806.08756, 2018.
- B. Wen, W. Lian, K. E. Bekris, and S. Schaal, “You only demonstrate once: Category-level manipulation from single visual demonstration,” ArXiv, vol. abs/2201.12716, 2022.
- W. Goodwin et al., “You only look at one: Category-level object representations for pose estimation from a single example,” in CoRL, 2023.
- X. Deng, Y. Xiang, A. Mousavian, C. Eppner, T. Bretl, and D. Fox, “Self-supervised 6d object pose estimation for robot manipulation,” in ICRA, 2020.
- X. Li, H. Wang, L. Yi, L. Guibas, A. L. Abbott, and S. Song, “Category-level articulated object pose estimation,” CVPR, 2020.
- S. Devgon et al., “Orienting novel 3d objects using self-supervised learning of rotation transforms,” 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), 2020.
- H. Yisheng, W. Yao, F. Haoqiang, C. Qifeng, and S. Jian, “Fs6d: Few-shot 6d pose estimation of novel objects,” CVPR, 2022.
- 2014.
- E. Valassakis, K. Dreczkowski, and E. Johns, “Learning eye-in-hand calibration from a single image,” in CoRL, 2021.
- H. Fang et al., “Graspnet-1billion: A large-scale benchmark for general object grasping,” 2020 CVPR, pp. 11441–11450, 2020.
- H. Xu, J. Zhang, J. Cai, H. Rezatofighi, F. Yu, D. Tao, and A. Geiger, “Unifying flow, stereo and depth estimation,” 2022.
- W. Boerdijk, M. Sundermeyer, M. Durner, and R. Triebel, “Self-supervised object-in-gripper segmentation from robotic motions,” in Conference on Robot Learning, 2020.
- E. Johns, “Coarse-to-fine imitation learning: Robot manipulation from a single demonstration,” in IEEE International Conference on Robotics and Automation (ICRA), 2021.
- M. Caron et al., “Emerging properties in self-supervised vision transformers,” 2021 ICCV), 2021.
- K. S. Arun, T. S. Huang, and S. D. Blostein, “Least-squares fitting of two 3-d point sets,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-9, pp. 698–700, 1987.
- D. Hadjivelichkov et al., “One-Shot Transfer of Affordance Regions? AffCorrs!,” in CoRL, 2023.
- V. Vosylius and E. Johns, “Where to start? collision-free transfer of skills to new environments,” in CoRL, 2022.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” 2015. cite arxiv:1505.04597Comment: conditionally accepted at MICCAI 2015.
- T.-Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision, 2014.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014. cite arxiv:1412.6980Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015.
- R. Liu, J. Lehman, P. Molino, F. P. Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution.,” in NeurIPS, 2018.
- H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in arXiv, 2018.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in CVPR, IEEE, 2016.
- C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, “Deep spatial autoencoders for visuomotor learning,” in ICRA, 2016.
- Q.-Y. Zhou, J. Park, and V. Koltun, “Open3D: A modern library for 3D data processing,” arXiv:1801.09847, 2018.
- J. Park, Q.-Y. Zhou, and V. Koltun, “Colored point cloud registration revisited,” in IEEE International Conference on Computer Vision (ICCV), pp. 143–152, 2017.