RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools (2306.14447v2)
Abstract: Humans excel in complex long-horizon soft body manipulation tasks via flexible tool use: bread baking requires a knife to slice the dough and a rolling pin to flatten it. Often regarded as a haLLMark of human cognition, tool use in autonomous robots remains limited due to challenges in understanding tool-object interactions. Here we develop an intelligent robotic system, RoboCook, which perceives, models, and manipulates elasto-plastic objects with various tools. RoboCook uses point cloud scene representations, models tool-object interactions with Graph Neural Networks (GNNs), and combines tool classification with self-supervised policy learning to devise manipulation plans. We demonstrate that from just 20 minutes of real-world interaction data per tool, a general-purpose robot arm can learn complex long-horizon soft object manipulation tasks, such as making dumplings and alphabet letter cookies. Extensive evaluations show that RoboCook substantially outperforms state-of-the-art approaches, exhibits robustness against severe external disturbances, and demonstrates adaptability to different materials.
- Sim-to-real reinforcement learning for deformable object manipulation. In Conference on Robot Learning, pages 734–743. PMLR, 2018.
- Softgym: Benchmarking deep reinforcement learning for deformable object manipulation. In Conference on Robot Learning, pages 432–448. PMLR, 2021.
- Challenges and outlook in robotic manipulation of deformable objects. IEEE Robotics & Automation Magazine, 29(3):67–77, 2022.
- Modeling, learning, perception, and control methods for deformable object manipulation. Science Robotics, 6(54):eabd8803, 2021.
- Long-horizon multi-robot rearrangement planning for construction assembly. IEEE Transactions on Robotics, 2022.
- S. Nair and C. Finn. Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation. In International Conference on Learning Representations.
- Modeling long-horizon tasks as sequential interaction landscapes. In Conference on Robot Learning, pages 471–484. PMLR, 2021.
- A long horizon planning framework for manipulating rigid pointcloud objects. In Conference on Robot Learning, pages 1582–1601. PMLR, 2021.
- A. Billard and D. Kragic. Trends and challenges in robot manipulation. Science, 364(6446):eaat8414, 2019.
- A brief review of affordance in robotic manipulation research. Advanced Robotics, 31(19-20):1086–1101, 2017.
- Toolflownet: Robotic manipulation with tools via predicting tool flow from point clouds. In 6th Annual Conference on Robot Learning, 2022.
- Improvisation through physical understanding: Using novel objects as tools with visual foresight. In Proceedings of Robotics: Science and Systems, FreiburgimBreisgau, Germany, June 2019. doi:10.15607/RSS.2019.XV.001.
- The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
- Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In International Conference on Learning Representations, 2018.
- Propagation networks for model-based control under partial observation. In 2019 International Conference on Robotics and Automation (ICRA), pages 1205–1211. IEEE, 2019.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017a.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017b.
- RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks. In Proceedings of Robotics: Science and Systems, New York City, NY, USA, June 2022. doi:10.15607/RSS.2022.XVIII.008.
- C. Matl and R. Bajcsy. Deformable elasto-plastic object shaping using an elastic hand and model-based reinforcement learning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3955–3962. IEEE, 2021.
- Planning with spatial-temporal abstraction from point clouds for deformable object manipulation. In 6th Annual Conference on Robot Learning, 2022.
- Learning multi-object dynamics with compositional neural radiance fields. In Conference on Robot Learning, pages 1755–1768. PMLR, 2023.
- Iterative Residual Policy for Goal-Conditioned Dynamic Manipulation of Deformable Objects. In Proceedings of Robotics: Science and Systems, New York City, NY, USA, June 2022. doi:10.15607/RSS.2022.XVIII.016.
- H. Ha and S. Song. Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding. In Conference on Robot Learning, pages 24–33. PMLR, 2022.
- Plasticinelab: A soft-body manipulation benchmark with differentiable physics. In International Conference on Learning Representations, 2020.
- Diffskill: Skill abstraction from differentiable physics for deformable object manipulations with tools. In International Conference on Learning Representations, 2021.
- Soft object deformation monitoring and learning for model-based robotic hand manipulation. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(3):740–753, 2011.
- Contact points discovery for soft-body manipulations with differentiable physics. In International Conference on Learning Representations, 2021.
- Automatic 3-d manipulation of soft objects by robotic arms with an adaptive deformation model. IEEE Transactions on Robotics, 32(2):429–441, 2016.
- Model-free vision-based shaping of deformable plastic materials. The International Journal of Robotics Research, 39(14):1739–1759, 2020.
- Active outline shaping of a rheological object based on plastic deformation distribution. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1386–1391. IEEE, 2011.
- B. Balaguer and S. Carpin. Combining imitation and reinforcement learning to fold deformable planar objects. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1405–1412. IEEE, 2011.
- Multi-modal sensing and robotic manipulation of non-rigid objects: A survey. Robotics, 7(4):74, 2018.
- Cloth manipulation using random-forest-based imitation learning. IEEE Robotics and Automation Letters, 4(2):2086–2093, 2019.
- Learning complex sequential tasks from demonstration: A pizza dough rolling case study. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 611–612. Ieee, 2016.
- Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems, 29, 2016.
- A compositional object-based approach to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.
- Graph networks as learnable physics engines for inference and control. In International Conference on Machine Learning, pages 4470–4479. PMLR, 2018.
- Planning with learned object importance in large problem instances using graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11962–11971, 2021.
- A model for tool-use traditions in primates: implications for the coevolution of culture and cognition. Journal of Human Evolution, 44(6):645–664, 2003.
- Force-and-motion constrained planning for tool use. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7409–7416. IEEE, 2019.
- Object manipulation, tool use and sensorimotor intelligence as feeding adaptations in cebus monkeys and great apes. Journal of Human Evolution, 6(7):623–641, 1977.
- T. Ingold. Tool-use, sociality and intelligence. Tools, language and cognition in human evolution, 429(45):449–72, 1993.
- T. Matsuzawa. Primate foundations of human intelligence: a view of tool use in nonhuman primates and fossil hominids. In Primate origins of human cognition and behavior, pages 3–25. Springer, 2008.
- Learning sensorimotor primitives of sequential manipulation tasks from visual demonstrations. In 2022 International Conference on Robotics and Automation (ICRA), pages 8591–8597. IEEE, 2022.
- Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing, volume 7, 2006.
- H. Edelsbrunner and E. P. Mücke. Three-dimensional alpha shapes. ACM Transactions on Graphics (TOG), 13(1):43–72, 1994.
- M. Attene. A lightweight approach to repairing digitized polygon meshes. The visual computer, 26(11):1393–1406, 2010.
- C. Yuksel. Sample elimination for generating poisson disk sample sets. Computer Graphics Forum, 34(2):25–32, 2015.
- A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017.
- The earth mover’s distance as a metric for image retrieval. International journal of computer vision, 40(2):99–121, 2000.
- 3d bounding box estimation using deep learning and geometry. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 7074–7082, 2017.
- 3d-rcnn: Instance-level 3d object reconstruction via render-and-compare. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3559–3568, 2018.
- R. Fletcher. Practical methods of optimization. John Wiley & Sons, 2013.
- The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, volume 133. Springer, 2004.
- The generalized interpolation material point method. Computer Modeling in Engineering and Sciences, 5(6):477–496, 2004.
- Mesh-based Dynamics with Occlusion Reasoning for Cloth Manipulation. In Proceedings of Robotics: Science and Systems, New York City, NY, USA, June 2022. doi:10.15607/RSS.2022.XVIII.011.