Compositional Servoing by Recombining Demonstrations (2310.04271v1)
Abstract: Learning-based manipulation policies from image inputs often show weak task transfer capabilities. In contrast, visual servoing methods allow efficient task transfer in high-precision scenarios while requiring only a few demonstrations. In this work, we present a framework that formulates the visual servoing task as graph traversal. Our method not only extends the robustness of visual servoing, but also enables multitask capability based on a few task-specific demonstrations. We construct demonstration graphs by splitting existing demonstrations and recombining them. In order to traverse the demonstration graph in the inference case, we utilize a similarity function that helps select the best demonstration for a specific task. This enables us to compute the shortest path through the graph. Ultimately, we show that recombining demonstrations leads to higher task-respective success. We present extensive simulation and real-world experimental results that demonstrate the efficacy of our approach.
- C. Breazeal and B. Scassellati, “Robots that imitate humans,” Trends in cognitive sciences, vol. 6, no. 11, pp. 481–487, 2002.
- C. Celemin, R. Pérez-Dattari, E. Chisari, G. Franzese, L. de Souza Rosa, R. Prakash, Z. Ajanović, M. Ferraz, A. Valada, J. Kober, et al., “Interactive imitation learning in robotics: A survey,” Foundations and Trends® in Robotics, vol. 10, no. 1-2, pp. 1–197, 2022.
- E. Valassakis, G. Papagiannis, N. D. Palo, and E. Johns, “Demonstrate once, imitate immediately (dome): Learning visual servoing for one-shot imitation learning,” Int. Conf. on Intelligent Robots and Systems, pp. 8614–8621, 2022.
- T. Zhang, Z. McCarthy, O. Jow, D. Lee, X. Chen, K. Goldberg, and P. Abbeel, “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,” in Int. Conf. on Robotics and Automation, 2018, pp. 5628–5635.
- C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine, “One-shot visual imitation learning via meta-learning,” in Conference on robot learning, 2017, pp. 357–368.
- T. Mitsuda, Y. Miyazaki, N. Maru, K. F. MacDorman, A. Nishikawa, and F. Miyazaki, “Visual servoing based on coarse optical flow,” IFAC Proceedings Volumes, vol. 32, no. 2, pp. 539 – 544, 1999.
- M. Argus, L. Hermann, J. Long, and T. Brox, “Flowcontrol: Optical flow based visual servoing,” Int. Conf. on Intelligent Robots and Systems, pp. 7534–7541, 2020.
- S. Parisot, P. M. Esperança, S. G. McDonagh, T. Madarász, Y. Yang, and Z. Li, “Long-tail recognition via compositional knowledge transfer,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 6929–6938, 2021.
- Y. Sun, Y. Ming, X. Zhu, and Y. Li, “Out-of-distribution detection with deep nearest neighbors,” in Int. Conf. on Machine Learning, 2022, pp. 20 827–20 840.
- M. Tatarchenko, S. R. Richter, R. Ranftl, Z. Li, V. Koltun, and T. Brox, “What do single-view 3d reconstruction networks learn?” in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2019.
- X. Qi, Q. Chen, J. Jia, and V. Koltun, “Semi-parametric image synthesis,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 8808–8816, 2018.
- O. Ashual, S. Sheynin, A. Polyak, U. Singer, O. Gafni, E. Nachmani, and Y. Taigman, “Knn-diffusion: Image generation via large-scale retrieval,” ArXiv, vol. abs/2204.02849, 2022.
- S. Izquierdo, M. Argus, and T. Brox, “Conditional visual servoing for multi-step tasks,” Int. Conf. on Intelligent Robots and Systems, pp. 2190–2196, 2022.
- S. Haldar, V. Mathur, D. Yarats, and L. Pinto, “Watch and match: Supercharging imitation with regularized optimal transport,” in Conf. on Robot Learning, 2023, pp. 32–43.
- B. Wen, W. Lian, K. E. Bekris, and S. Schaal, “You only demonstrate once: Category-level manipulation from single visual demonstration,” Robotics: Science and Systems, 2022.
- J. Pari, N. M. M. Shafiullah, S. P. Arunachalam, and L. Pinto, “The surprising effectiveness of representation learning for visual imitation,” Robotics: Science and Systems, 2022.
- S. Nair, A. Rajeswaran, V. Kumar, C. Finn, and A. Gupta, “R3m: A universal visual representation for robot manipulation,” Conf. on Robot Learning, 2022.
- D. Pathak, P. Mahmoudieh, G. Luo, P. Agrawal, D. Chen, Y. Shentu, E. Shelhamer, J. Malik, A. A. Efros, and T. Darrell, “Zero-shot visual imitation,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 2131–21 313, 2018.
- M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Proc. of the Conf. on Neural Information Processing Systems (NIPS), vol. 30, 2017.
- A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine, “Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations,” in Robotics: Science and Systems, 2018.
- A. Mandlekar, D. Xu, R. Martín-Martín, S. Savarese, and L. Fei-Fei, “Learning to generalize across long-horizon tasks from human demonstrations,” Robotics: Science and Systems, 2020.
- D. Xu, S. Nair, Y. Zhu, J. Gao, A. Garg, L. Fei-Fei, and S. Savarese, “Neural task programming: Learning to generalize across hierarchical tasks,” Int. Conf. on Robotics and Automation, pp. 1–8, 2017.
- M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, A. Herzog, D. Ho, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, E. Jang, R. J. Ruano, K. Jeffrey, S. Jesmonth, N. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, K.-H. Lee, S. Levine, Y. Lu, L. Luu, C. Parada, P. Pastor, J. Quiambao, K. Rao, J. Rettinghouse, D. Reyes, P. Sermanet, N. Sievers, C. Tan, A. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, S. Xu, M. Yan, and A. Zeng, “Do as i can and not as i say: Grounding language in robotic affordances,” in arXiv preprint arXiv:2204.01691, 2022.
- C. Wang, D. Xu, and L. Fei-Fei, “Generalizable task planning through representation pretraining,” IEEE Robotics and Automation Letters, vol. 7, pp. 8299–8306, 2022.
- T. Kipf, Y. Li, H. Dai, V. F. Zambaldi, A. Sanchez-Gonzalez, E. Grefenstette, P. Kohli, and P. W. Battaglia, “Compile: Compositional imitation learning and execution,” in Int. Conf. on Machine Learning, 2018.
- T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” in Conf. on Robot Learning, 2020, pp. 1094–1100.
- S. Hangl, V. Dunjko, H. J. Briegel, and J. H. Piater, “Skill learning by autonomous robotic playing using active learning and exploratory behavior composition,” Frontiers in Robotics and AI, vol. 7, 2020.
- E. Rosete-Beas, O. Mees, G. Kalweit, J. Boedecker, and W. Burgard, “Latent plans for task agnostic offline reinforcement learning,” in Conf. on Robot Learning, Auckland, New Zealand, 2022.
- K. Pertsch, Y. Lee, and J. J. Lim, “Accelerating reinforcement learning with learned skill priors,” in Conf. on Robot Learning, 2020.
- L. Mezghani, S. Sukhbaatar, T. Lavril, O. Maksymets, D. Batra, P. Bojanowski, and A. Karteek, “Memory-augmented reinforcement learning for image-goal navigation,” Int. Conf. on Intelligent Robots and Systems, pp. 3316–3323, 2021.
- D.-A. Huang, S. Nair, D. Xu, Y. Zhu, A. Garg, L. Fei-Fei, S. Savarese, and J. C. Niebles, “Neural task graphs: Generalizing to unseen tasks from a single video demonstration,” in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2019, pp. 8557–8566.
- J. Ortiz-Haro, J.-S. Ha, D. Driess, E. Karpas, and M. Toussaint, “Learning feasibility of factored nonlinear programs in robotic manipulation planning,” in Int. Conf. on Robotics and Automation, 2023, pp. 3729–3735.
- R. Gieselmann and F. T. Pokorny, “Latent planning via expansive tree search,” in Proc. of the Conf. on Neural Information Processing Systems (NIPS), A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022.
- L. Kavraki, P. Svestka, J.-C. Latombe, and M. Overmars, “Probabilistic roadmaps for path planning in high-dimensional configuration spaces,” IEEE Tran. on Rob. and Aut., vol. 12, no. 4, pp. 566–580, 1996.
- S. M. LaValle, “Rapidly-exploring random trees : a new tool for path planning,” The annual research report, 1998.
- H. Lim, S. N. Sinha, M. F. Cohen, and M. Uyttendaele, “Real-time image-based 6-dof localization in large-scale environments,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1043–1050, 2012.
- B. Williams, G. S. W. Klein, and I. Reid, “Automatic relocalization and loop closing for real-time monocular slam,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 1699–1712, 2011.
- O. Beker, M. Mohammadi, and A. Zamir, “Palmer: Perception-action loop with memory for long-horizon planning,” Proc. of the Conf. on Neural Information Processing Systems (NIPS), vol. 35, pp. 34 258–34 271, 2022.
- K. Fang, P. Yin, A. Nair, H. Walke, G. Yan, and S. Levine, “Generalization with lossy affordances: Leveraging broad offline data for learning visuomotor tasks,” in Conf. on Robot Learning, 2022.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Int. Conf. on Med. Image Comp. and Comp.-Assisted Interv., 2015, pp. 234–241.
- K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Europ. Conf. on Computer Vision. Springer, 2016, pp. 630–645.
- S. Hochreiter and Y. Bengio, “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” in A Field Guide to Dynamical Recurrent Networks, 2001.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Proc. of the Conf. on Neural Information Processing Systems (NIPS), vol. 30, 2017.
- P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2019, pp. 12 716–12 725.
- S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” J. Mach. Learn. Res., vol. 17, pp. 39:1–39:40, 2015.
- J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, “How to train your robot with deep reinforcement learning: lessons we have learned,” Int. Journal of Robotics Research, vol. 40, pp. 698 – 721, 2021.
- E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
- J. Y. Jason, A. W. Harley, and K. G. Derpanis, “Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness,” in Europ. Conf. on Computer Vision. Springer, 2016, pp. 3–10.
- Z. Ren, J. Yan, B. Ni, B. Liu, X. Yang, and H. Zha, “Unsupervised deep learning for optical flow estimation,” in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), 2017.
- D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 337–33 712, 2018.