SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion (2209.03855v4)
Abstract: Multi-objective optimization problems are ubiquitous in robotics, e.g., the optimization of a robot manipulation task requires a joint consideration of grasp pose configurations, collisions and joint limits. While some demands can be easily hand-designed, e.g., the smoothness of a trajectory, several task-specific objectives need to be learned from data. This work introduces a method for learning data-driven SE(3) cost functions as diffusion models. Diffusion models can represent highly-expressive multimodal distributions and exhibit proper gradients over the entire space due to their score-matching training objective. Learning costs as diffusion models allows their seamless integration with other costs into a single differentiable objective function, enabling joint gradient-based motion optimization. In this work, we focus on learning SE(3) diffusion models for 6DoF grasping, giving rise to a novel framework for joint grasp and motion optimization without needing to decouple grasp selection from trajectory generation. We evaluate the representation power of our SE(3) diffusion models w.r.t. classical generative models, and we showcase the superior performance of our proposed optimization framework in a series of simulated and real-world robotic manipulation tasks against representative baselines.
- N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa, “Chomp: Gradient optimization techniques for efficient motion planning,” in IEEE International Conference on Robotics and Automation, 2009.
- M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal, “Stomp: Stochastic trajectory optimization for motion planning,” in IEEE International Conference on Robotics and Automation, 2011.
- J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan, S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequential convex optimization and convex collision checking,” The International Journal of Robotics Research, 2014.
- D. Rakita, B. Mutlu, and M. Gleicher, “RelaxedIK: Real-time synthesis of accurate and feasible robot arm motion.” in Robotics: Science and Systems, 2018.
- T. Osa, “Motion planning by learning the solution manifold in trajectory optimization,” The International Journal of Robotics Research, 2022.
- A. Mousavian, C. Eppner, and D. Fox, “6-DoF graspnet: Variational grasp generation for object manipulation,” in International Conference on Computer Vision, 2019.
- J. Urain, M. Ginesi, D. Tateo, and J. Peters, “Imitationflows: Learning deep stable stochastic dynamic systems by normalizing flows,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020.
- A. Simeonov, Y. Du, A. Tagliasacchi, J. B. Tenenbaum, A. Rodriguez, P. Agrawal, and V. Sitzmann, “Neural descriptor fields: Se (3)-equivariant object representations for manipulation,” in International Conference on Robotics and Automation. IEEE, 2022.
- D. Koert, G. Maeda, R. Lioutikov, G. Neumann, and J. Peters, “Demonstration based trajectory optimization for generalizable robot motions,” in IEEE-RAS International Conference on Humanoid Robots, 2016.
- A. Lambert, A. T. Le, J. Urain, G. Chalvatzaki, B. Boots, and J. Peters, “Learning implicit priors for motion optimization,” IEEE International Conference on Intelligent Robots and Systems, 2022.
- A. Murali, A. Mousavian, C. Eppner, C. Paxton, and D. Fox, “6-DoF grasping for target-driven object manipulation in clutter,” in IEEE International Conference on Robotics and Automation, 2020.
- Q. Lu, K. Chenna, B. Sundaralingam, and T. Hermans, “Planning multi-fingered grasps as probabilistic inference in a learned deep network,” in Robotics Research, 2020.
- C. Finn, S. Levine, and P. Abbeel, “Guided cost learning: Deep inverse optimal control via policy optimization,” in International Conference on Machine Learning, 2016.
- M. Arjovsky and L. Bottou, “Towards principled methods for training generative adversarial networks,” arXiv preprint arXiv:1701.04862, 2017.
- T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in International Conference on Learning Representations, 2018.
- Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2020.
- C. Luo, “Understanding diffusion models: A unified perspective,” 2022. [Online]. Available: https://arxiv.org/abs/2208.11970
- C.-W. Huang, M. Aghajohari, A. J. Bose, P. Panangaden, and A. Courville, “Riemannian diffusion models,” arXiv preprint arXiv:2208.07949, 2022.
- V. D. Bortoli, E. Mathieu, M. J. Hutchinson, J. Thornton, Y. W. Teh, and A. Doucet, “Riemannian score-based generative modelling,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=oDRQGo8I7P
- D. Gnaneshwar, B. Ramsundar, D. Gandhi, R. Kurchin, and V. Viswanathan, “Score-based generative models for molecule generation,” arXiv preprint arXiv:2203.04698, 2022.
- C. Eppner, A. Mousavian, and D. Fox, “Acronym: A large-scale grasp dataset based on simulation,” in IEEE International Conference on Robotics and Automation, 2021.
- Y. Song and D. P. Kingma, “How to train your energy-based models,” arXiv preprint arXiv:2101.03288, 2021.
- M. Janner, Y. Du, J. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” in International Conference on Machine Learning, 2022.
- Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in Neural Information Processing Systems, 2019.
- P. Vincent, “A connection between score matching and denoising autoencoders,” Neural computation, 2011.
- S. Saremi, A. Mehrjou, B. Schölkopf, and A. Hyvärinen, “Deep energy estimator networks,” arXiv preprint arXiv:1805.08306, 2018.
- Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” Advances in Neural Information Processing Systems, 2020.
- R. M. Neal et al., “Mcmc using hamiltonian dynamics,” Handbook of markov chain monte carlo, 2011.
- J. Sola, J. Deray, and D. Atchuthan, “A micro lie theory for state estimation in robotics,” arXiv preprint arXiv:1812.01537, 2018.
- G. Chirikjian and M. Kobilarov, “Gaussian approximation of non-linear measurement models on lie groups,” in IEEE Conference on Decision and Control, 2014.
- L. Pineda, T. Fan, M. Monge, S. Venkataraman, P. Sodhi, R. Chen, J. Ortiz, D. DeTone, A. Wang, S. Anderson et al., “Theseus: A library for differentiable nonlinear optimization,” arXiv preprint arXiv:2207.09442, 2022.
- B. Wen, C. Mitash, B. Ren, and K. E. Bekris, “se (3)-tracknet: Data-driven 6d pose tracking by calibrating image residuals in synthetic domains,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 10 367–10 373.
- Z. Jiang, Y. Zhu, M. Svetlik, K. Fang, and Y. Zhu, “Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations,” in Robotics: Science and Systems, 2021.
- A. Simeonov, Y. Du, A. Tagliasacchi, J. B. Tenenbaum, A. Rodriguez, P. Agrawal, and V. Sitzmann, “Neural descriptor fields: Se(3)-equivariant object representations for manipulation,” in International Conference on Robotics and Automation, 2022.
- J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “Deepsdf: Learning continuous signed distance functions for shape representation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
- F. Lagriffoul, D. Dimitrov, J. Bidot, A. Saffiotti, and L. Karlsson, “Efficiently combining task and motion planning using geometric constraints,” The International Journal of Robotics Research, 2014.
- M. Botvinick and M. Toussaint, “Planning as inference,” Trends in cognitive sciences, 2012.
- S. Levine, “Reinforcement learning and control as probabilistic inference: Tutorial and review,” arXiv preprint arXiv:1805.00909, 2018.
- J. Urain, P. Liu, A. Li, C. D’Eramo, and J. Peters, “Composable Energy Policies for Reactive Motion Generation and Reinforcement Learning ,” in Robotics: Science and Systems, 2021.
- A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
- V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021.
- A. Tanaka, “Discriminator optimal transport,” Advances in Neural Information Processing Systems, 2019.
- M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact-graspnet: Efficient 6-DoF grasp generation in cluttered scenes,” in IEEE International Conference on Robotics and Automation, 2021.
- A. ten Pas, M. Gualtieri, K. Saenko, and R. Platt, “Grasp pose detection in point clouds,” The International Journal of Robotics Research, 2017.
- K. Rahardja and A. Kosaka, “Vision-based bin-picking: Recognition and localization of multiple complex objects using simple visual cues,” in IEEE/RSJ International Conference on Intelligent Robots and Systems., 1996.
- J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,” arXiv preprint arXiv:1703.09312, 2017.
- L. Wang, Y. Xiang, and D. Fox, “Manipulation Trajectory Optimization with Online Grasp Synthesis and Selection,” in Robotics: Science and Systems, 2020.
- I. Kapelyukh, V. Vosylius, and E. Johns, “Dall-e-bot: Introducing web-scale diffusion models to robotics,” arXiv preprint arXiv:2210.02438, 2022.
- W. Liu, T. Hermans, S. Chernova, and C. Paxton, “Structdiffusion: Object-centric diffusion for semantic rearrangement of novel objects,” arXiv preprint arXiv:2211.04604, 2022.
- A. Ajay, Y. Du, A. Gupta, J. Tenenbaum, T. Jaakkola, and P. Agrawal, “Is conditional generative modeling all you need for decision-making?” arXiv preprint arXiv:2211.15657, 2022.
- Z. Wang, J. J. Hunt, and M. Zhou, “Diffusion policies as an expressive policy class for offline reinforcement learning,” arXiv preprint arXiv:2208.06193, 2022.
- Z. Zhong, D. Rempe, D. Xu, Y. Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simulation,” arXiv preprint arXiv:2210.17366, 2022.
- W. Park, J. S. Kim, Y. Zhou, N. J. Cowan, A. M. Okamura, and G. S. Chirikjian, “Diffusion-based motion planning for a nonholonomic flexible needle model,” in Proceedings of the 2005 IEEE International Conference on Robotics and Automation. IEEE, 2005, pp. 4600–4605.
- X. Lou, Y. Yang, and C. Choi, “Collision-aware target-driven object grasping in constrained environments,” in IEEE International Conference on Robotics and Automation, 2021.
- H. Liang, X. Ma, S. Li, M. Görner, S. Tang, B. Fang, F. Sun, and J. Zhang, “Pointnetgpd: Detecting grasp configurations from point sets,” in International Conference on Robotics and Automation, 2019.
- X. Yan, J. Hsu, M. Khansari, Y. Bai, A. Pathak, A. Gupta, J. Davidson, and H. Lee, “Learning 6-DoF grasping interaction via deep geometry-aware 3d representations,” in IEEE International Conference on Robotics and Automation, 2018.
- J. Hager, R. Bauer, M. Toussaint, and J. Mainprice, “Graspme-grasp manifold estimator,” in 2021 30th IEEE International Conference on Robot & Human Interactive Communication, 2021.
- T. Weng, D. Held, F. Meier, and M. Mukadam, “Neural grasp distance fields for robot manipulation,” arXiv preprint arXiv:2211.02647, 2022.
- A. Dragan, G. J. Gordon, and S. Srinivasa, “Learning from experience in manipulation planning: Setting the right goals,” Robotics Research, 2017.
- D. Berenson, S. Srinivasa, and J. Kuffner, “Task space regions: A framework for pose-constrained manipulation planning,” The International Journal of Robotics Research, 2011.
- N. Vahrenkamp, M. Do, T. Asfour, and R. Dillmann, “Integrated grasp and motion planning,” in IEEE International Conference on Robotics and Automation. IEEE, 2010.
- N. Funk, C. Schaff, R. Madan, T. Yoneda, J. U. De Jesus, J. Watson, E. K. Gordon, F. Widmaier, S. Bauer, S. S. Srinivasa et al., “Benchmarking structured policies and policy optimization for real-world dexterous object manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 478–485, 2021.
- J. Fontanals, B.-A. Dang-Vu, O. Porges, J. Rosell, and M. A. Roa, “Integrated grasp and motion planning using independent contact regions,” in IEEE-RAS International Conference on Humanoid Robots, 2014.
- S. M. LaValle et al., “Rapidly-exploring random trees: A new tool for path planning,” The annual research report, 1998.
- D. F. Crouse, “On implementing 2d rectangular assignment algorithms,” IEEE Transactions on Aerospace and Electronic Systems, 2016.
- G. Sutanto, A. Wang, Y. Lin, M. Mukadam, G. Sukhatme, A. Rai, and F. Meier, “Encoding physical constraints in differentiable newton-euler algorithm,” in Machine Learning Research, 2020.
- M. Bhardwaj, B. Sundaralingam, A. Mousavian, N. D. Ratliff, D. Fox, F. Ramos, and B. Boots, “Storm: An integrated framework for fast joint-space model-predictive control for reactive manipulation,” in Conference on Robot Learning, 2022.
- Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” in Robotics: Science and Systems, 2018.
- C. Deng, O. Litany, Y. Duan, A. Poulenard, A. Tagliasacchi, and L. J. Guibas, “Vector neurons: A general framework for so (3)-equivariant networks,” in IEEE/CVF International Conference on Computer Vision, 2021.
- Y. Xie, T. Takikawa, S. Saito, O. Litany, S. Yan, N. Khan, F. Tombari, J. Tompkin, V. Sitzmann, and S. Sridhar, “Neural fields in visual computing and beyond,” in Computer Graphics Forum, 2022.
- D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek, “The trimmed iterative closest point algorithm,” in International Conference on Pattern Recognition, 2002.