Learning Keypoints for Robotic Cloth Manipulation using Synthetic Data (2401.01734v2)
Abstract: Assistive robots should be able to wash, fold or iron clothes. However, due to the variety, deformability and self-occlusions of clothes, creating robot systems for cloth manipulation is challenging. Synthetic data is a promising direction to improve generalization, but the sim-to-real gap limits its effectiveness. To advance the use of synthetic data for cloth manipulation tasks such as robotic folding, we present a synthetic data pipeline to train keypoint detectors for almost-flattened cloth items. To evaluate its performance, we have also collected a real-world dataset. We train detectors for both T-shirts, towels and shorts and obtain an average precision of 64% and an average keypoint distance of 18 pixels. Fine-tuning on real-world data improves performance to 74% mAP and an average distance of only 9 pixels. Furthermore, we describe failure modes of the keypoint detectors and compare different approaches to obtain cloth meshes and materials. We also quantify the remaining sim-to-real gap and argue that further improvements to the fidelity of cloth assets will be required to further reduce this gap. The code, dataset and trained models are available
- P. Jiménez, “Visual grasp point localization, classification and state recognition in robotic manipulation of cloth: An overview,” Robotics and Autonomous Systems, vol. 92, pp. 107–125, 2017.
- H. Yin, A. Varava, and D. Kragic, “Modeling, learning, perception, and control methods for deformable object manipulation,” Science Robotics, vol. 6, no. 54, p. eabd8803, 2021.
- H. Ha and S. Song, “Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding,” in Conference on Robot Learning. PMLR, 2022, pp. 24–33.
- Y. Avigal, L. Berscheid, T. Asfour, T. Kröger, and K. Goldberg, “Speedfolding: Learning efficient bimanual folding of garments,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 1–8.
- A. Canberk, C. Chi, H. Ha, B. Burchfiel, E. Cousineau, S. Feng, and S. Song, “Cloth funnels: Canonicalized-alignment for multi-purpose garment manipulation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5872–5879.
- R. Proesmans, A. Verleysen, and F. wyffels, “Unfoldir: Tactile robotic unfolding of cloth,” IEEE Robotics and Automation Letters, 2023.
- J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2017, pp. 23–30.
- J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield, “Deep object pose estimation for semantic robotic grasping of household objects,” in Conference on Robot Learning. PMLR, 2018, pp. 306–316.
- N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning. PMLR, 2022, pp. 91–100.
- E. Wood, T. Baltrušaitis, C. Hewitt, S. Dziadzio, T. J. Cashman, and J. Shotton, “Fake it till you make it: face analysis in the wild using synthetic data alone,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 3681–3691.
- J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
- A. Doumanoglou, J. Stria, G. Peleka, I. Mariolis, V. Petrik, A. Kargakos, L. Wagner, V. Hlaváč, T.-K. Kim, and S. Malassiotis, “Folding clothes autonomously: A complete pipeline,” IEEE Transactions on Robotics, vol. 32, no. 6, pp. 1461–1478, 2016.
- V.-L. De Gusseme and F. wyffels, “Effective cloth folding trajectories in simulation with only two parameters,” Frontiers in Neurorobotics, vol. 16, p. 989702, 2022.
- E. Corona, G. Alenya, A. Gabas, and C. Torras, “Active garment recognition and target grasping point detection using deep learning,” Pattern Recognition, vol. 74, pp. 629–641, 2018.
- D. Seita, A. Ganapathi, R. Hoque, M. Hwang, E. Cen, A. K. Tanwani, A. Balakrishna, B. Thananjeyan, J. Ichnowski, N. Jamali, et al., “Deep imitation learning of sequential fabric smoothing from an algorithmic supervisor,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 9651–9658.
- “Nvidia flex,” https://docs.nvidia.com/gameworks/content/gameworkslibrary/physx/flex/index.html.
- J. Zhu, A. Cherubini, C. Dune, D. Navarro-Alarcon, F. Alambeigi, D. Berenson, F. Ficuciello, K. Harada, J. Kober, X. Li, et al., “Challenges and outlook in robotic manipulation of deformable objects,” IEEE Robotics & Automation Magazine, vol. 29, no. 3, pp. 67–77, 2022.
- J. Qian, T. Weng, L. Zhang, B. Okorn, and D. Held, “Cloth region segmentation for robust grasp selection,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 9553–9560.
- T. Lips and F. De Gusseme, Victor-Louis wyffels, “Learning keypoints from synthetic data for robotic cloth folding,” RMDO Workshop ICRA, 2022.
- D. Seita, N. Jamali, M. Laskey, A. K. Tanwani, R. Berenstein, P. Baskaran, S. Iba, J. Canny, and K. Goldberg, “Deep transfer learning of pick points on fabric for robot bed-making,” in The International Symposium of Robotics Research. Springer, 2019, pp. 275–290.
- J. Matas, S. James, and A. J. Davison, “Sim-to-real reinforcement learning for deformable object manipulation,” in Conference on Robot Learning. PMLR, 2018, pp. 734–743.
- E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
- Blender Online Community, “Blender - a 3d modelling and rendering package,” http://www.blender.org, Blender Foundation.
- A. Ganapathi, P. Sundaresan, B. Thananjeyan, A. Balakrishna, D. Seita, J. Grannen, M. Hwang, R. Hoque, J. E. Gonzalez, N. Jamali, et al., “Learning dense visual correspondences in simulation to smooth and fold real fabrics,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 11 515–11 522.
- H. Bertiche, M. Madadi, and S. Escalera, “Cloth3d: clothed 3d humans,” in European Conference on Computer Vision. Springer, 2020, pp. 344–359.
- A. Verleysen, M. Biondina, and F. wyffels, “Video dataset of human demonstrations of folding clothing for robotic folding,” The International Journal of Robotics Research, vol. 39, no. 9, pp. 1031–1036, 2020.
- T. Ziegler, J. Butepage, M. C. Welle, A. Varava, T. Novkovic, and D. Kragic, “Fashion landmark detection and category classification for robotics,” in 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC). IEEE, 2020, pp. 81–88.
- I. Garcia-Camacho, J. Borràs, B. Calli, A. Norton, and G. Alenyà, “Household cloth object set: Fostering benchmarking in deformable object manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 5866–5873, 2022.
- S. Miller, M. Fritz, T. Darrell, and P. Abbeel, “Parametrized shape models for clothing,” in 2011 IEEE International Conference on Robotics and Automation. IEEE, 2011, pp. 4861–4868.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- PolyHaven Team, “Polyhaven: The public 3d assets library,” https://polyhaven.com/.
- X. Lin, Y. Wang, J. Olkin, and D. Held, “Softgym: Benchmarking deep reinforcement learning for deformable object manipulation,” in Conference on Robot Learning. PMLR, 2021, pp. 432–448.
- L. Downs, A. Francis, N. Koenig, B. Kinman, R. Hickman, K. Reymann, T. B. McHugh, and V. Vanhoucke, “Google scanned objects: A high-quality dataset of 3d scanned household items,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2553–2560.
- T. Jakab, A. Gupta, H. Bilen, and A. Vedaldi, “Unsupervised learning of object landmarks through conditional image generation,” Advances in neural information processing systems, vol. 31, 2018.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
- Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y. Li, “Maxvit: Multi-axis vision transformer,” in European conference on computer vision. Springer, 2022, pp. 459–479.
- R. Wightman, “Pytorch image models,” https://github.com/rwightman/pytorch-image-models, 2019.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “Dreamfusion: Text-to-3d using 2d diffusion,” arXiv preprint arXiv:2209.14988, 2022.
- D. Blanco-Mulero, O. Barbany, G. Alcan, A. Colomé, C. Torras, and V. Kyrki, “Benchmarking the sim-to-real gap in cloth manipulation,” arXiv preprint arXiv:2310.09543, 2023.
- M. Li, D. M. Kaufman, and C. Jiang, “Codimensional incremental potential contact,” ACM Trans. Graph., vol. 40, no. 4, jul 2021.
- T. Yu, T. Xiao, A. Stone, J. Tompson, A. Brohan, S. Wang, J. Singh, C. Tan, J. Peralta, B. Ichter, et al., “Scaling robot learning with semantically imagined experience,” arXiv preprint arXiv:2302.11550, 2023.