Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects (2404.12524v1)

Published 18 Apr 2024 in cs.CV, cs.LG, and cs.RO

Abstract: Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a denoising autoencoder represents deformable objects of varying topology as sets of latent codes. Second, a visual predictive model performs autoregressive set prediction to determine long-horizon geometrical deformation and topological changes purely in latent space. Given a partial initial state and desired manipulation trajectories, it infers all resulting object geometries and topologies at each step. DoughNet thereby allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals. Our experiments in simulated and real environments show that DoughNet is able to significantly outperform related approaches that consider deformation only as geometrical change.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Sculptbot: Pre-trained models for 3d deformable object manipulation. arXiv preprint arXiv:2309.08728, 2023.
  2. Klaus-Jürgen Bathe. Finite element procedures. Klaus-Jurgen Bathe, 2006.
  3. Scheduled sampling for sequence prediction with recurrent neural networks. NIPS, 28, 2015.
  4. Topology-aware single-image 3d shape reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 270–271, 2020.
  5. Per-pixel classification is not all you need for semantic segmentation. NeurIPS, 34:17864–17875, 2021.
  6. Pybullet, a python module for physics simulation for games, robotics and machine learning. 2016.
  7. A tutorial on the cross-entropy method. Annals of operations research, 134:19–67, 2005.
  8. Blenderproc2: A procedural pipeline for photorealistic rendering. Journal of Open Source Software, 8(82):4901, 2023. doi: 10.21105/joss.04901.
  9. Learning multi-object dynamics with compositional neural radiance fields. pages 1755–1768, 2023.
  10. Sarah F Frisken Gibson. Constrained elastic surfacenets: Generating smooth models from binary segmented data. TR99, 24, 1999.
  11. Dream to control: Learning behaviors by latent imagination. In ICLR, 2019.
  12. DiSECt: A Differentiable Simulation Engine for Autonomous Robotic Cutting. In RSS, Virtual, July 2021. doi: 10.15607/RSS.2021.XVII.067.
  13. A moving least squares material point method with displacement discontinuity and two-way rigid body coupling. ACM Transactions on Graphics (TOG), 37(4):150, 2018.
  14. Taichi: a language for high-performance computation on spatially sparse data structures. ACM TOG, 38(6):201, 2019.
  15. Peter J Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1):73–101, 1964.
  16. Dual contouring of hermite data. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 339–346, 2002.
  17. Differentiable physics simulation of dynamics-augmented neural objects. RAL, 8(5):2780–2787, 2023.
  18. Dexdeform: Dexterous deformable object manipulation with human demonstrations and differentiable physics. arXiv preprint arXiv:2304.03223, 2023.
  19. Pac-nerf: Physics augmented continuum neural radiance fields for geometry-agnostic system identification. In ICLR, 2022.
  20. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In ICLR, 2019.
  21. Visual grounding of learned physical models. In ICML, 2020.
  22. Focal loss for dense object detection. In ICCV, pages 2980–2988, 2017.
  23. Diffskill: Skill abstraction from differentiable physics for deformable object manipulations with tools. In ICLR, 2022a.
  24. Planning with spatial-temporal abstraction from point clouds for deformable object manipulation. In CoRL, 2022b.
  25. Unified particle physics for real-time applications. ACM Transactions on Graphics (TOG), 33(4):1–12, 2014.
  26. Deformable elasto-plastic object shaping using an elastic hand and model-based reinforcement learning. In IROS, pages 3955–3962, 2021.
  27. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  28. Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, pages 165–174, 2019.
  29. David J Pearce. An improved algorithm for finding the strongly connected components of a directed graph. Victoria University, Wellington, NZ, Tech. Rep, 2005.
  30. Convolutional occupancy networks. In ECCV, pages 523–540. Springer, 2020.
  31. Learning closed-loop dough manipulation using a differentiable reset module. RAL, 7(4):9857–9864, 2022.
  32. Florian Schmidt. Generalization in generation: A closer look at exposure bias. arXiv preprint arXiv:1910.00292, 2019.
  33. Masked world models for visual control. In CoRL, pages 1332–1344, 2023.
  34. Acid: Action-conditional implicit visual dynamics for deformable object manipulation. In RSS, 2022.
  35. Robocraft: Learning to see, simulate, and shape elasto-plastic objects with graph networks. In RSS, 2022.
  36. Robocook: Long-horizon elasto-plastic object manipulation with diverse tools. arXiv preprint arXiv:2306.14447, 2023.
  37. A particle method for history-dependent materials. Computer methods in applied mechanics and engineering, 118(1-2):179–196, 1994.
  38. Evaluation of the azure kinect and its comparison to kinect v1 and kinect v2. Sensors, 21(2):413, 2021.
  39. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  40. Learning the dynamics of compliant tool-environment interaction for visuo-tactile contact servoing. In CoRL, pages 2052–2061, 2023.
  41. Attention is all you need. NeurIPS, 30, 2017.
  42. Virdo: Visio-tactile implicit representations of deformable objects. In ICRA, pages 3583–3590, 2022a.
  43. Virdo++: Real-world, visuo-tactile dynamics and perception of deformable objects. In CoRL, 2022b.
  44. Make a donut: Language-guided hierarchical emd-space planning for zero-shot deformable object manipulation. arXiv preprint arXiv:2311.02787, 2023.
  45. 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. ACM Trans. Graph., 42(4), jul 2023. ISSN 0730-0301. doi: 10.1145/3592442.
  46. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation. arXiv preprint arXiv:2306.17115, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com