Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation (2403.12861v1)

Published 19 Mar 2024 in cs.RO and cs.LG

Abstract: Mastering dexterous robotic manipulation of deformable objects is vital for overcoming the limitations of parallel grippers in real-world applications. Current trajectory optimisation approaches often struggle to solve such tasks due to the large search space and the limited task information available from a cost function. In this work, we propose D-Cubed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks. D-Cubed learns a skill-latent space that encodes short-horizon actions in the play dataset using a VAE and trains a LDM to compose the skill latents into a skill trajectory, representing a long-horizon action trajectory in the dataset. To optimise a trajectory for a target task, we introduce a novel gradient-free guided sampling method that employs the Cross-Entropy method within the reverse diffusion process. In particular, D-Cubed samples a small number of noisy skill trajectories using the LDM for exploration and evaluates the trajectories in simulation. Then, D-Cubed selects the trajectory with the lowest cost for the subsequent reverse process. This effectively explores promising solution areas and optimises the sampled trajectories towards a target task throughout the reverse diffusion process. Through empirical evaluation on a public benchmark of dexterous deformable object manipulation tasks, we demonstrate that D-Cubed outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin. We further demonstrate that trajectories found by D-Cubed readily transfer to a real-world LEAP hand on a folding task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
  2. A system for general in-hand object re-orientation. In Aleksandra Faust, David Hsu, and Gerhard Neumann, editors, Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, pages 297–307. PMLR, 08–11 Nov 2022.
  3. Deep dynamics models for learning dexterous manipulation. In Conference on Robot Learning, pages 1101–1112. PMLR, 2020.
  4. Contact-invariant optimization for hand manipulation. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, page 137–144, 2012.
  5. Dexterous manipulation using both palm and fingers. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1560–1565, 2014.
  6. Relaxed-rigidity constraints: Kinematic trajectory optimization and collision avoidance for in-grasp manipulation, 2018.
  7. Solving challenging dexterous manipulation tasks with trajectory optimisation and reinforcement learning, 2021.
  8. Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation. Robotics and Autonomous Systems, 112, 11 2018.
  9. Learning from demonstrations for autonomous soft-tissue retraction. In 2021 International Symposium on Medical Robotics (ISMR), pages 1–7. IEEE, 2021.
  10. Robocraft: Learning to see, simulate, and shape elasto-plastic objects with graph networks. arXiv preprint arXiv:2205.02909, 2022.
  11. Robocook: Long-horizon elasto-plastic object manipulation with diverse tools. arXiv preprint arXiv:2306.14447, 2023.
  12. Plasticinelab: A soft-body manipulation benchmark with differentiable physics, 2021.
  13. Rethinking optimization with differentiable simulation from a global perspective. In 6th Annual Conference on Robot Learning, 2022.
  14. Dexdeform: Dexterous deformable object manipulation with human demonstrations and differentiable physics. In The Eleventh International Conference on Learning Representations, 2023.
  15. Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239, 2020.
  16. Reuven Y. Rubinstein. Optimization of computer simulation models with rare events. European Journal of Operational Research, 99(1):89–112, 1997.
  17. Marin Kobilarov. Cross-entropy motion planning. The International Journal of Robotics Research, 31(7):855–871, 2012.
  18. Softgym: Benchmarking deep reinforcement learning for deformable object manipulation. In Conference on Robot Learning, pages 432–448. PMLR, 2021.
  19. Benchmarking deformable object manipulation with differentiable physics. arXiv preprint arXiv:2210.13066, 2022.
  20. Benchmarking the sim-to-real gap in cloth manipulation, 2024.
  21. ShadowRobot. Shadowrobot dexterous hand., 2015.
  22. Optimal control with learned local models: Application to dexterous manipulation. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 378–383, 2016.
  23. Leveraging scene embeddings for gradient-based motion planning in latent space. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023.
  24. Differentiable simulation of soft multi-body systems. Advances in Neural Information Processing Systems, 34:17123–17135, 2021.
  25. Difftaichi: Differentiable programming for physical simulation. In International Conference on Learning Representations, 2019.
  26. Diffskill: Skill abstraction from differentiable physics for deformable object manipulations with tools. In International Conference on Learning Representations, 2022.
  27. Aggressive driving with model predictive path integral control. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 1433–1440, 2016.
  28. 3d diffusion policy, 2024.
  29. Deep dynamics models for learning dexterous manipulation, 2019.
  30. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations, 2018.
  31. Solving rubik’s cube with a robot hand. ArXiv, abs/1910.07113, 2019.
  32. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265. PMLR, 07–09 Jul 2015.
  33. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2020.
  34. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  35. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
  36. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, 2022.
  37. Adaptdiffuser: Diffusion models as adaptive self-evolving planners. arXiv preprint arXiv:2302.01877, 2023.
  38. World models via policy-guided trajectory diffusion, 2023.
  39. Denoising likelihood score matching for conditional score-based data generation. In International Conference on Learning Representations, 2022.
  40. Diffusion-es: Gradient-free planning with diffusion for autonomous driving and zero-shot instruction following. arXiv preprint arXiv:2402.06559, 2024.
  41. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  42. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  43. Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 9164–9170. IEEE, 2020.
  44. Accelerating reinforcement learning with learned skill priors. In Conference on robot learning, pages 188–204. PMLR, 2021.
  45. Playfusion: Skill acquisition via diffusion from language-annotated play. In Conference on Robot Learning, pages 2012–2029. PMLR, 2023.
  46. Reuven Rubinstein. The cross-entropy method for combinatorial and continuous optimization. Method. Comput. Appl. Prob., 1(2):127–190, sep 1999.
  47. Sample-efficient cross-entropy method for real-time planning. In Conference on Robot Learning, pages 1049–1065. PMLR, 2021.
  48. Online variants of the cross-entropy method, 2008.
  49. Sinkhorn divergences for unbalanced optimal transport. arXiv preprint arXiv:1910.12958, 2019.
  50. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  51. Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 2021.
  52. Leap hand: Low-cost, efficient, and anthropomorphic hand for robot learning. Robotics: Science and Systems (RSS), 2023.
  53. Isaac gym: High performance gpu-based physics simulation for robot learning, 2021.
  54. SAPIEN: A simulated part-based interactive environment. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  55. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  56. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Citations (3)

Summary

  • The paper introduces D-Cubed, a novel trajectory optimization method that leverages latent diffusion models to enhance dexterous deformable manipulation.
  • It employs a gradient-free guided sampling strategy with the Cross-Entropy method to efficiently explore and refine diverse skill trajectories.
  • Empirical results on six benchmark tasks demonstrate significant improvements over traditional optimization and reinforcement learning techniques.

D-Cubed: Enhanced Trajectory Optimisation for Dexterous Manipulation via Latent Diffusion Models

Introduction

In recent years, dexterous manipulation of deformable objects has emerged as a critical challenge in robotics, requiring advancements beyond the capabilities of traditional parallel grippers. Existing trajectory optimisation methods often fall short in addressing the complexity of such tasks, limited by the vast search space and the sparse task information from cost functions. In our exploration, we introduce D-Cubed, a novel trajectory optimisation methodology that employs a latent diffusion model (LDM) trained on a task-agnostic play dataset to address dexterous deformable object manipulation. This approach is distinguished by its ability to leverage skill latents encoded from short-horizon actions, which are then composed into skill trajectories to represent long-horizon actions. A unique aspect of D-Cubed is its innovative gradient-free guided sampling method, utilizing the Cross-Entropy method within the reverse diffusion process for trajectory optimisation.

Methodology

D-Cubed is built upon a variational autoencoder (VAE) that encodes short-horizon action sequences into a skill-latent space, using data from a play dataset containing various hand motions. These skill latents are composed by an LDM into skill trajectories, which are subsequently optimised for specific tasks. The gradient-free guided sampling strategy crucially enables the exploration of promising solution areas by sampling and evaluating trajectories in simulation, refining towards the target task throughout the reverse diffusion process. This method stands out for its capacity to generate diverse, meaningful trajectories of robot hand motions, facilitating efficient exploration in complex manipulation tasks.

Empirical Evaluation

Our empirical evaluation on a public benchmark suite featuring six dexterous deformable object manipulation tasks demonstrates D-Cubed's superior performance over existing trajectory optimisation and reinforcement learning approaches. The method significantly outperforms traditional optimisation techniques and RL models, showcasing its effective exploration and exploitation capabilities in complex manipulation scenarios. Additionally, ablation studies confirm the importance of the skill-latent space in performance improvement and detail the effectiveness of various components of D-Cubed.

Implications and Future Directions

The introduction of D-Cubed represents a significant advancement in trajectory optimisation for dexterous manipulation tasks. By effectively leveraging latent skill representations and adopting a novel gradient-free guided sampling method, D-Cubed opens new avenues for research and application in robotic manipulation. The method's ability to explore and exploit large search spaces presents opportunities for tackling a wide range of complex manipulation tasks beyond deformable object handling. Future work may focus on extending D-Cubed's capabilities to more generalized manipulation tasks, improving computational efficiency, and exploring real-world applications. Additionally, further research into improving the transferability of trajectories from simulation to real-world environments could enhance the practical applicability of D-Cubed in robotics.

Conclusion

D-Cubed introduces a groundbreaking approach to trajectory optimisation for dexterous deformable object manipulation, leveraging latent diffusion models and a novel sampling method. Its success in efficiently exploring and optimizing trajectories in complex manipulation tasks marks a significant step forward in robotics research, with potential impacts across a broad spectrum of applications. As we continue to push the boundaries of what is achievable in robotic manipulation, D-Cubed offers a promising pathway for future advancements.