DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects (2404.12524v1)
Abstract: Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a denoising autoencoder represents deformable objects of varying topology as sets of latent codes. Second, a visual predictive model performs autoregressive set prediction to determine long-horizon geometrical deformation and topological changes purely in latent space. Given a partial initial state and desired manipulation trajectories, it infers all resulting object geometries and topologies at each step. DoughNet thereby allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals. Our experiments in simulated and real environments show that DoughNet is able to significantly outperform related approaches that consider deformation only as geometrical change.
- Sculptbot: Pre-trained models for 3d deformable object manipulation. arXiv preprint arXiv:2309.08728, 2023.
- Klaus-Jürgen Bathe. Finite element procedures. Klaus-Jurgen Bathe, 2006.
- Scheduled sampling for sequence prediction with recurrent neural networks. NIPS, 28, 2015.
- Topology-aware single-image 3d shape reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 270–271, 2020.
- Per-pixel classification is not all you need for semantic segmentation. NeurIPS, 34:17864–17875, 2021.
- Pybullet, a python module for physics simulation for games, robotics and machine learning. 2016.
- A tutorial on the cross-entropy method. Annals of operations research, 134:19–67, 2005.
- Blenderproc2: A procedural pipeline for photorealistic rendering. Journal of Open Source Software, 8(82):4901, 2023. doi: 10.21105/joss.04901.
- Learning multi-object dynamics with compositional neural radiance fields. pages 1755–1768, 2023.
- Sarah F Frisken Gibson. Constrained elastic surfacenets: Generating smooth models from binary segmented data. TR99, 24, 1999.
- Dream to control: Learning behaviors by latent imagination. In ICLR, 2019.
- DiSECt: A Differentiable Simulation Engine for Autonomous Robotic Cutting. In RSS, Virtual, July 2021. doi: 10.15607/RSS.2021.XVII.067.
- A moving least squares material point method with displacement discontinuity and two-way rigid body coupling. ACM Transactions on Graphics (TOG), 37(4):150, 2018.
- Taichi: a language for high-performance computation on spatially sparse data structures. ACM TOG, 38(6):201, 2019.
- Peter J Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1):73–101, 1964.
- Dual contouring of hermite data. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 339–346, 2002.
- Differentiable physics simulation of dynamics-augmented neural objects. RAL, 8(5):2780–2787, 2023.
- Dexdeform: Dexterous deformable object manipulation with human demonstrations and differentiable physics. arXiv preprint arXiv:2304.03223, 2023.
- Pac-nerf: Physics augmented continuum neural radiance fields for geometry-agnostic system identification. In ICLR, 2022.
- Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In ICLR, 2019.
- Visual grounding of learned physical models. In ICML, 2020.
- Focal loss for dense object detection. In ICCV, pages 2980–2988, 2017.
- Diffskill: Skill abstraction from differentiable physics for deformable object manipulations with tools. In ICLR, 2022a.
- Planning with spatial-temporal abstraction from point clouds for deformable object manipulation. In CoRL, 2022b.
- Unified particle physics for real-time applications. ACM Transactions on Graphics (TOG), 33(4):1–12, 2014.
- Deformable elasto-plastic object shaping using an elastic hand and model-based reinforcement learning. In IROS, pages 3955–3962, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, pages 165–174, 2019.
- David J Pearce. An improved algorithm for finding the strongly connected components of a directed graph. Victoria University, Wellington, NZ, Tech. Rep, 2005.
- Convolutional occupancy networks. In ECCV, pages 523–540. Springer, 2020.
- Learning closed-loop dough manipulation using a differentiable reset module. RAL, 7(4):9857–9864, 2022.
- Florian Schmidt. Generalization in generation: A closer look at exposure bias. arXiv preprint arXiv:1910.00292, 2019.
- Masked world models for visual control. In CoRL, pages 1332–1344, 2023.
- Acid: Action-conditional implicit visual dynamics for deformable object manipulation. In RSS, 2022.
- Robocraft: Learning to see, simulate, and shape elasto-plastic objects with graph networks. In RSS, 2022.
- Robocook: Long-horizon elasto-plastic object manipulation with diverse tools. arXiv preprint arXiv:2306.14447, 2023.
- A particle method for history-dependent materials. Computer methods in applied mechanics and engineering, 118(1-2):179–196, 1994.
- Evaluation of the azure kinect and its comparison to kinect v1 and kinect v2. Sensors, 21(2):413, 2021.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Learning the dynamics of compliant tool-environment interaction for visuo-tactile contact servoing. In CoRL, pages 2052–2061, 2023.
- Attention is all you need. NeurIPS, 30, 2017.
- Virdo: Visio-tactile implicit representations of deformable objects. In ICRA, pages 3583–3590, 2022a.
- Virdo++: Real-world, visuo-tactile dynamics and perception of deformable objects. In CoRL, 2022b.
- Make a donut: Language-guided hierarchical emd-space planning for zero-shot deformable object manipulation. arXiv preprint arXiv:2311.02787, 2023.
- 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. ACM Trans. Graph., 42(4), jul 2023. ISSN 0730-0301. doi: 10.1145/3592442.
- Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation. arXiv preprint arXiv:2306.17115, 2023.