Interactive Perception for Deformable Object Manipulation (2403.05177v2)
Abstract: Interactive perception enables robots to manipulate the environment and objects to bring them into states that benefit the perception process. Deformable objects pose challenges to this due to significant manipulation difficulty and occlusion in vision-based perception. In this work, we address such a problem with a setup involving both an active camera and an object manipulator. Our approach is based on a sequential decision-making framework and explicitly considers the motion regularity and structure in coupling the camera and manipulator. We contribute a method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration. The effectiveness of the framework and approach are validated in both a simulation and a real dual-arm robot setup. Our results confirm the necessity of an active camera and coordinative motion in interactive perception for deformable objects.
- J. Bohg, K. Hausman, B. Sankaran, O. Brock, D. Kragic, S. Schaal, and G. S. Sukhatme, “Interactive perception: Leveraging action in perception and perception in action,” IEEE Transactions on Robotics, vol. 33, no. 6, pp. 1273–1291, 2017.
- R. Cheng, A. Agarwal, and K. Fragkiadaki, “Reinforcement learning of active vision for manipulating objects under occlusions,” in Conference on Robot Learning. PMLR, 2018, pp. 422–431.
- T. Novkovic, R. Pautrat, F. Furrer, M. Breyer, R. Siegwart, and J. Nieto, “Object finding in cluttered scenes using interactive perception,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 8338–8344.
- R. M. Martin and O. Brock, “Online interactive perception of articulated objects with multi-level recursive estimation based on task-specific priors,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2014, pp. 2494–2501.
- D. Katz and O. Brock, “Manipulating articulated objects with interactive perception,” in 2008 IEEE International Conference on Robotics and Automation. IEEE, 2008, pp. 272–277.
- D. Katz, A. Orthey, and O. Brock, “Interactive perception of articulated objects,” in Experimental Robotics. Springer, 2014, pp. 301–315.
- H. Yin, A. Varava, and D. Kragic, “Modeling, learning, perception, and control methods for deformable object manipulation,” Science Robotics, vol. 6, no. 54, p. eabd8803, 2021.
- V. E. Arriola-Rios, P. Guler, F. Ficuciello, D. Kragic, B. Siciliano, and J. L. Wyatt, “Modeling of deformable objects for robotic manipulation: A tutorial and review,” Frontiers in Robotics and AI, vol. 7, p. 82, 2020.
- J. Zhu, A. Cherubini, C. Dune, D. Navarro-Alarcon, F. Alambeigi, D. Berenson, F. Ficuciello, K. Harada, X. Li, J. Pan et al., “Challenges and outlook in robotic manipulation of deformable objects,” arXiv preprint arXiv:2105.01767, 2021.
- R. Herguedas, G. López-Nicolás, R. Aragüés, and C. Sagüés, “Survey on multi-robot manipulation of deformable objects,” in 2019 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 2019, pp. 977–984.
- S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421–436, 2018.
- D. Kragic and H. I. Christensen, “Survey on visual servoing for manipulation,” COMPUTATIONAL VISION AND ACTIVE PERCEPTION LABORATORY, Tech. Rep., 2002.
- J. Aloimonos, I. Weiss, and A. Bandyopadhyay, “Active vision,” International journal of computer vision, vol. 1, no. 4, pp. 333–356, 1988.
- R. Bajcsy, Y. Aloimonos, and J. K. Tsotsos, “Revisiting active perception,” Autonomous Robots, vol. 42, no. 2, pp. 177–196, 2018.
- D. H. Ballard, “Animate vision,” Artificial intelligence, vol. 48, no. 1, pp. 57–86, 1991.
- E. Rivlin and H. Rotstein, “Control of a camera for active vision: Foveal vision, smooth tracking and saccade,” International Journal of Computer Vision, vol. 39, no. 2, pp. 81–96, 2000.
- X. Zhang, D. Wang, S. Han, W. Li, B. Zhao, Z. Wang, X. Duan, C. Fang, X. Li, and J. He, “Affordance-driven next-best-view planning for robotic grasping,” arXiv preprint arXiv:2309.09556, 2023.
- J. Sock, S. Hamidreza Kasaei, L. Seabra Lopes, and T.-K. Kim, “Multi-view 6d object pose estimation and camera motion planning using rgbd images,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 2228–2235.
- S. Wenhardt, B. Deutsch, E. Angelopoulou, and H. Niemann, “Active visual object reconstruction using d-, e-, and t-optimal next best views,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007, pp. 1–7.
- P.-P. Vázquez, M. Feixas, M. Sbert, and W. Heidrich, “Viewpoint selection using viewpoint entropy.” in VMV, vol. 1. Citeseer, 2001, pp. 273–280.
- H. Van Hoof, O. Kroemer, H. B. Amor, and J. Peters, “Maximally informative interaction learning for scene exploration,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5152–5158.
- E. Gärtner, A. Pirinen, and C. Sminchisescu, “Deep reinforcement learning for active human pose estimation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 10 835–10 844.
- M. Lippi, P. Poklukar, M. C. Welle, A. Varava, H. Yin, A. Marino, and D. Kragic, “Latent space roadmap for visual action planning of deformable and rigid object manipulation,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 5619–5626.
- P. Zhou, J. Zhu, S. Huo, and D. Navarro-Alarcon, “LaSeSOM: A latent and semantic representation framework for soft object manipulation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5381–5388, 2021.
- L. Manuelli, W. Gao, P. Florence, and R. Tedrake, “kpam: Keypoint affordances for category-level robotic manipulation,” arXiv preprint arXiv:1903.06684, 2019.
- Z. Weng, F. Paus, A. Varava, H. Yin, T. Asfour, and D. Kragic, “Graph-based task-specific prediction models for interactions between deformable and rigid objects,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 5741–5748.
- A. Bahety, S. Jain, H. Ha, N. Hager, B. Burchfiel, E. Cousineau, S. Feng, and S. Song, “Bag all you need: Learning a generalizable bagging strategy for heterogeneous objects,” arXiv preprint arXiv:2210.09997, 2022.
- Z. Xu, C. Chi, B. Burchfiel, E. Cousineau, S. Feng, and S. Song, “Dextairity: Deformable manipulation can be a breeze,” arXiv preprint arXiv:2203.01197, 2022.
- L. Y. Chen, B. Shi, D. Seita, R. Cheng, T. Kollar, D. Held, and K. Goldberg, “Autobag: Learning to open plastic bags and insert objects,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3918–3925.
- P. Zhou, P. Zheng, J. Qi, C. Li, C. Yang, D. Navarro-Alarcon, and J. Pan, “Bimanual deformable bag manipulation using a structure-of-interest based latent dynamics model,” arXiv preprint arXiv:2401.11432, 2024.
- X. Lin, Y. Wang, J. Olkin, and D. Held, “Softgym: Benchmarking deep reinforcement learning for deformable object manipulation,” arXiv preprint arXiv:2011.07215, 2020.
- R. Antonova, P. Shi, H. Yin, Z. Weng, and D. K. Jensfelt, “Dynamic environments with deformable objects,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- B. Calli, W. Caarls, M. Wisse, and P. P. Jonker, “Active vision via extremum seeking for robots in unstructured environments: Applications in object recognition and manipulation,” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 4, pp. 1810–1822, 2018.
- J. A. Gibbs, M. P. Pound, A. P. French, D. M. Wells, E. H. Murchie, and T. P. Pridmore, “Active vision and surface reconstruction for 3d plant shoot modelling,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 17, no. 6, pp. 1907–1917, 2019.
- L. Manuelli, Y. Li, P. Florence, and R. Tedrake, “Keypoints into the future: Self-supervised correspondence in model-based reinforcement learning,” arXiv preprint arXiv:2009.05085, 2020.
- A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. Battaglia, “Learning to simulate complex physics with graph networks,” in International Conference on Machine Learning. PMLR, 2020, pp. 8459–8468.
- J. Townsend, N. Koep, and S. Weichwald, “Pymanopt: A python toolbox for optimization on manifolds using automatic differentiation,” Journal of Machine Learning Research, vol. 17, no. 137, p. 1–5, 2016.
- E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” 2016.
- D. Seita, P. Florence, J. Tompson, E. Coumans, V. Sindhwani, K. Goldberg, and A. Zeng, “Learning to rearrange deformable cables, fabrics, and bags with goal-conditioned transporter networks,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 4568–4575.
- H. Joo, H. Liu, L. Tan, L. Gui, B. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y. Sheikh, “Panoptic studio: A massively multiview system for social motion capture,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3334–3342.
- A. Y. Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Icml, vol. 99, 1999, pp. 278–287.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.