Bayesian Optimization for Sample-Efficient Policy Improvement in Robotic Manipulation (2403.14305v2)
Abstract: Sample efficient learning of manipulation skills poses a major challenge in robotics. While recent approaches demonstrate impressive advances in the type of task that can be addressed and the sensing modalities that can be incorporated, they still require large amounts of training data. Especially with regard to learning actions on robots in the real world, this poses a major problem due to the high costs associated with both demonstrations and real-world robot interactions. To address this challenge, we introduce BOpt-GMM, a hybrid approach that combines imitation learning with own experience collection. We first learn a skill model as a dynamical system encoded in a Gaussian Mixture Model from a few demonstrations. We then improve this model with Bayesian optimization building on a small number of autonomous skill executions in a sparse reward setting. We demonstrate the sample efficiency of our approach on multiple complex manipulation skills in both simulations and real-world experiments. Furthermore, we make the code and pre-trained models publicly available at http://bopt-gmm. cs.uni-freiburg.de.
- T. Osa, J. Pajarinen, G. Neumann, J. Bagnell, P. Abbeel, and J. Peters, “An algorithmic perspective on imitation learning,” Foundations and Trends in Robotics, vol. 7, pp. 1–179, 11 2018.
- E. Chisari, T. Welschehold, J. Boedecker, W. Burgard, and A. Valada, “Correct me if i am wrong: Interactive learning for robotic manipulation,” IEEE Robotics and Automation Letters, 2022.
- J. O. von Hartz, E. Chisari, T. Welschehold et al., “The treachery of images: Bayesian scene keypoints for deep policy learning in robotic manipulation,” IEEE Robotics and Automation Letters, 2023.
- D. Honerkamp, M. Büchner, F. Despinoy, T. Welschehold, and A. Valada, “Language-grounded dynamic scene graphs for interactive object search with mobile manipulation,” arXiv preprint arXiv:2403.08605, 2024.
- F. Schmalstieg, D. Honerkamp, T. Welschehold, and A. Valada, “Learning hierarchical interactive multi-object search for mobile manipulation,” IEEE Robotics and Automation Letters, 2023.
- D. Honerkamp, T. Welschehold, and A. Valada, “N22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTm22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Learning navigation for arbitrary mobile manipulation motions in unseen and dynamic environments,” IEEE Transactions on Robotics, 2023.
- S. Schaal, S. Kotosaka, and D. Sternad, “Nonlinear dynamical systems as movement primitives,” Int. Journal of Humanoid Robotics, 2000.
- S. M. Khansari-Zadeh and A. Billard, “Learning stable nonlinear dynamical systems with gaussian mixture models,” IEEE Transactions on Robotics, vol. 27, no. 5, pp. 943–957, 2011.
- A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, “Dynamical movement primitives: learning attractor models for motor behaviors,” Neural computation, vol. 25, no. 2, pp. 328–373, 2013.
- S. Manschitz, M. Gienger, J. Kober, and J. Peters, “Mixture of attractors: A novel movement primitive representation for learning motor skills from demonstrations,” IEEE Rob. and Aut. Letters, 2018.
- I. Nematollahi, E. Rosete-Beas, A. Röfer, T. Welschehold, A. Valada, and W. Burgard, “Robot skill adaptation via soft actor-critic gaussian mixture models,” in Int. Conf. on Rob. and Aut., 2022.
- F. Stulp and O. Sigaud, “Policy improvement: Between black-box optimization and episodic reinforcement learning,” in Journées Francophones Planification, 2013.
- P. Englert and M. Toussaint, “Learning manipulation skills from a single demonstration,” The International Journal of Robotics Research, vol. 37, no. 1, pp. 137–154, 2018.
- L. Johannsmeier, M. Gerchow, and S. Haddadin, “A framework for robot manipulation: Skill formalism, meta learning and adaptive control,” in International Conference on Robotics and Automation, 2019.
- Z. Wu, W. Lian, C. Wang, M. Li, S. Schaal, and M. Tomizuka, “Prim-lafd: A framework to learn and adapt primitive-based skills from demonstrations for insertion tasks,” IFAC-PapersOnLine, 2023.
- M. Bain and C. Sammut, “A framework for behavioural cloning.” in Machine Intelligence 15, 1995, pp. 103–129.
- A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Survey: Robot programming by demonstration,” Springer Handbook of Robotics, pp. 1371–1394, 2008.
- C. Celemin, R. Pérez-Dattari, E. Chisari, G. Franzese, L. de Souza Rosa, R. Prakash, Z. Ajanović, M. Ferraz, A. Valada, J. Kober et al., “Interactive imitation learning in robotics: A survey,” Foundations and Trends® in Robotics, vol. 10, no. 1-2, pp. 1–197, 2022.
- B. Zheng, S. Verma, J. Zhou, I. W. Tsang, and F. Chen, “Imitation learning: Progress, taxonomies and challenges,” IEEE Transactions on Neural Networks and Learning Systems, no. 99, pp. 1–16, 2022.
- L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A survey on imitation learning techniques for end-to-end autonomous vehicles,” IEEE Transactions on Intelligent Transportation Systems, 2022.
- C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine, “One-shot visual imitation learning via meta-learning,” in Conference on robot learning, 2017, pp. 357–368.
- J. Wong, A. Tung, A. Kurenkov, A. Mandlekar, L. Fei-Fei, S. Savarese, and R. Martín-Martín, “Error-aware imitation learning from teleoperation data for mobile manipulation,” in Conference on Robot Learning, 2022, pp. 1367–1378.
- A. Mandlekar, D. Xu, R. Martín-Martín, S. Savarese, and L. Fei-Fei, “Learning to generalize across long-horizon tasks from human demonstrations,” arXiv preprint arXiv:2003.06085, 2020.
- M. Shridhar, L. Manuelli, and D. Fox, “Perceiver-actor: A multi-task transformer for robotic manipulation,” in Conference on Robot Learning, 2023, pp. 785–799.
- A. Brohan, N. Brown, J. Carbajal et al., “Rt-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022.
- M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes et al., “Do as i can, not as i say: Grounding language in robotic affordances,” arXiv preprint arXiv:2204.01691, 2022.
- N. Figueroa and A. Billard, “A physically-consistent bayesian non-parametric mixture model for dynamical system learning.” in Proc. of the Conf. on Robot Learning, 2018, pp. 927–946.
- È. Pairet, P. Ardón, M. Mistry, and Y. Petillot, “Learning generalizable coupling terms for obstacle avoidance via low-dimensional geometric descriptors,” IEEE Robotics and Automation Letters, 2019.
- Z. Lu, N. Wang, and C. Yang, “A constrained dmps framework for robot skills learning and generalization from human demonstrations,” IEEE/ASME Transactions on Mechatronics, 2021.
- Y. Wang, N. Figueroa, S. Li, A. Shah, and J. Shah, “Temporal logic imitation: Learning plan-satisficing motion policies from demonstrations,” arXiv preprint arXiv:2206.04632, 2022.
- F. Voigt, L. Johannsmeier, and S. Haddadin, “Multi-level structure vs. end-to-end-learning in high-performance tactile robotic manipulation.” in Proc. of the Conf. on Robot Learning, 2020, pp. 2306–2316.
- L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020.
- D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” Journal of Global optimization, vol. 13, pp. 455–492, 1998.
- M. Lindauer, K. Eggensperger, M. Feurer et al., “Smac3: A versatile bayesian optimization package for hyperparameter optimization,” Journal of Machine Learning Research, 2022.