Diffusion for Multi-Embodiment Grasping (2410.18835v1)
Abstract: Grasping is a fundamental skill in robotics with diverse applications across medical, industrial, and domestic domains. However, current approaches for predicting valid grasps are often tailored to specific grippers, limiting their applicability when gripper designs change. To address this limitation, we explore the transfer of grasping strategies between various gripper designs, enabling the use of data from diverse sources. In this work, we present an approach based on equivariant diffusion that facilitates gripper-agnostic encoding of scenes containing graspable objects and gripper-aware decoding of grasp poses by integrating gripper geometry into the model. We also develop a dataset generation framework that produces cluttered scenes with variable-sized object heaps, improving the training of grasp synthesis methods. Experimental evaluation on diverse object datasets demonstrates the generalizability of our approach across gripper architectures, ranging from simple parallel-jaw grippers to humanoid hands, outperforming both single-gripper and multi-gripper state-of-the-art methods.
- M. Janner, et al., “Planning with diffusion for flexible behavior synthesis,” in International Conference on Machine Learning, 2022.
- J. Urain, et al., “Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5923–5930.
- H. Ryu, et al., “Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,” arXiv preprint arXiv:2309.02685, 2023.
- P. Li, et al., “Gendexgrasp: Generalizable dexterous grasping,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 8068–8074.
- M. Attarian, et al., “Geometry matching for multi-embodiment grasping,” in Conference on Robot Learning. PMLR, 2023, pp. 1242–1256.
- K. Bousmalis, et al., “Robocat: A self-improving generalist agent for robotic manipulation,” Transactions on Machine Learning Research, 2024. [Online]. Available: https://openreview.net/forum?id=vsCpILiWHu
- Q. Vuong, et al., “Open x-embodiment: Robotic learning datasets and RT-x models,” in Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition @ CoRL2023, 2023. [Online]. Available: https://openreview.net/forum?id=zraBtFgxT0
- Y.-L. Liao and T. Smidt, “Equiformer: Equivariant graph attention transformer for 3d atomistic graphs,” in International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=KwmPfARgOTD
- B. Calli, et al., “Yale-cmu-berkeley dataset for robotic manipulation research,” The International Journal of Robotics Research, vol. 36, no. 3, pp. 261–268, 2017.
- L. Downs, et al., “Google scanned objects: A high-quality dataset of 3d scanned household items,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2553–2560.
- H.-S. Fang, et al., “Graspnet-1billion: A large-scale benchmark for general object grasping,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 444–11 453.
- P. Schillinger, et al., “Model-free grasping with multi-suction cup grippers for robotic bin picking,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 3107–3113.
- J. Mahler, et al., “Dex-net 3.0: Computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning,” in 2018 IEEE International Conference on robotics and automation (ICRA). IEEE, 2018, pp. 5620–5627.
- S. Song, et al., “Grasping in the wild: Learning 6dof closed-loop grasping from low-cost demonstrations,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4978–4985, 2020.
- Z. Xu, et al., “Adagrasp: Learning an adaptive gripper-aware grasping policy,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 4620–4626.
- M. Sundermeyer, et al., “Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 438–13 444.
- H.-S. Fang, et al., “Anygrasp: Robust and efficient grasp perception in spatial and temporal domains,” IEEE Transactions on Robotics (T-RO), 2023.
- C. Eppner, A. Mousavian, and D. Fox, “ACRONYM: A large-scale grasp dataset based on simulation,” in 2021 IEEE Int. Conf. on Robotics and Automation, ICRA, 2020.
- S. Reed, et al., “A generalist agent,” arXiv preprint arXiv:2205.06175, 2022.
- Y. Jiang, S. Moseson, and A. Saxena, “Efficient grasping from rgbd images: Learning using a new rectangle representation,” in 2011 IEEE International Conference on Robotics and Automation, 2011, pp. 3304–3311.
- D. Morrison, P. Corke, and J. Leitner, “Egad! an evolved grasping analysis dataset for diversity and reproducibility in robotic manipulation,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4368–4375, 2020.
- J. Mahler, et al., “Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,” Robotics: Science and Systems (RSS), 2017.
- Octo Model Team, et al., “Octo: An open-source generalist robot policy,” in Proceedings of Robotics: Science and Systems, Delft, Netherlands, 2024.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
- G. Corso, et al., “Diffdock: Diffusion steps, twists, and turns for molecular docking,” in The Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=kKF8_K-mBbS
- Y. Song, et al., “Score-based generatileach2022denoisingve modeling through stochastic differential equations,” in International Conference on Learning Representations (ICLR), 2021. [Online]. Available: https://openreview.net/forum?id=PxTIG12RRHS
- V. De Bortoli, et al., “Riemannian score-based generative modelling,” Advances in Neural Information Processing Systems, vol. 35, pp. 2406–2422, 2022.
- A. Simeonov, et al., “Shelving, stacking, hanging: Relational pose diffusion for multi-modal rearrangement,” arXiv preprint arXiv:2307.04751, 2023.
- A. Leach, et al., “Denoising diffusion probabilistic models on so (3) for rotational alignment,” in ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
- J.-Y. Zhu, et al., “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
- L. Y. Chen, et al., “Mirage: Cross-embodiment zero-shot policy transfer with cross-painting,” arXiv preprint arXiv:2402.19249, 2024.
- T. Cohen and M. Welling, “Group equivariant convolutional networks,” in Proceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 2990–2999. [Online]. Available: https://proceedings.mlr.press/v48/cohenc16.html
- D. E. Worrall, et al., “Harmonic networks: Deep translation and rotation equivariance,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5028–5037.
- E. J. Bekkers, “B-spline {cnn}s on lie groups,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=H1gBhkBFDH
- M. Weiler, et al., “Coordinate independent convolutional networks–isometry and gauge equivariant convolutions on riemannian manifolds,” arXiv preprint arXiv:2106.06020, 2021.
- E. J. Bekkers, et al., “Roto-translation covariant convolutional networks for medical image analysis,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, A. F. Frangi, et al., Eds. Cham: Springer International Publishing, 2018, pp. 440–448.
- H. Ryu, et al., “Equivariant descriptor fields: Se(3)-equivariant energy-based models for end-to-end visual robotic manipulation learning,” in The Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=dnjZSPGmY5O
- J. Yang, et al., “Equibot: Sim(3)-equivariant diffusion policy for generalizable and data efficient learning,” 2024.
- F. Scarselli, et al., “The graph neural network model,” IEEE transactions on neural networks, vol. 20, no. 1, pp. 61–80, 2008.
- V. P. Dwivedi and X. Bresson, “A generalization of transformer networks to graphs,” AAAI Workshop on Deep Learning on Graphs: Methods and Applications, 2021.
- G. Li, et al., “Deepgcns: Can gcns go as deep as cnns?” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9267–9276.
- X. Lou, Y. Yang, and C. Choi, “Learning object relations with graph neural networks for target-driven grasping in dense clutter,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 742–748.
- I. Huang, et al., “Defgraspnets: Grasp planning on 3d fields with graph neural nets,” in 2023 IEEE International Conference on Robocycle gantics and Automation (ICRA). IEEE, 2023, pp. 5894–5901.
- E. Chun, et al., “Local neural descriptor fields: Locally conditioned object representations for manipulation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 1830–1836.
- A. Simeonov, et al., “Neural descriptor fields: Se (3)-equivariant object representations for manipulation,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 6394–6400.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033.
- K. Mamou, E. Lengyel, and A. Peters, “Volumetric hierarchical approximate convex decomposition,” Game engine gems, vol. 3, pp. 141–158, 2016.
- K. Zakka, Y. Tassa, and MuJoCo Menagerie Contributors, “MuJoCo Menagerie: A collection of high-quality simulation models for MuJoCo,” 2022. [Online]. Available: http://github.com/google-deepmind/mujoco_menagerie
- Y. Zhu, et al., “robosuite: A modular simulation framework and benchmark for robot learning,” in arXiv preprint arXiv:2009.12293, 2020.
- M. Geiger and T. Smidt, “e3nn: Euclidean neural networks,” 2022. [Online]. Available: https://arxiv.org/abs/2207.09453
- E. Perez, et al., “Film: Visual reasoning with a general conditioning layer,” in AAAI, 2018.
- T.-W. Ke, N. Gkanatsios, and K. Fragkiadaki, “3d diffuser actor: Policy diffusion with 3d scene representations,” Arxiv, 2024.
- S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=F72ximsx7C1
- N. Ravi, et al., “Sam 2: Segment anything in images and videos,” arXiv preprint arXiv:2408.00714, 2024. [Online]. Available: https://arxiv.org/abs/2408.00714
- J. Bradbury, et al., “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/google/jax
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.