Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Globally Stable Neural Imitation Policies (2403.04118v2)

Published 7 Mar 2024 in cs.RO and cs.LG

Abstract: Imitation learning presents an effective approach to alleviate the resource-intensive and time-consuming nature of policy learning from scratch in the solution space. Even though the resulting policy can mimic expert demonstrations reliably, it often lacks predictability in unexplored regions of the state-space, giving rise to significant safety concerns in the face of perturbations. To address these challenges, we introduce the Stable Neural Dynamical System (SNDS), an imitation learning regime which produces a policy with formal stability guarantees. We deploy a neural policy architecture that facilitates the representation of stability based on Lyapunov theorem, and jointly train the policy and its corresponding Lyapunov candidate to ensure global stability. We validate our approach by conducting extensive experiments in simulation and successfully deploying the trained policies on a real-world manipulator arm. The experimental results demonstrate that our method overcomes the instability, accuracy, and computational intensity problems associated with previous imitation learning methods, making our method a promising solution for stable policy learning in complex planning scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. S. Schaal, “Is imitation learning the route to humanoid robots?” Trends in cognitive sciences, vol. 3, no. 6, pp. 233–242, 1999.
  2. A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation learning: A survey of learning methods,” ACM Computing Surveys (CSUR), vol. 50, no. 2, pp. 1–35, 2017.
  3. N. Figueroa and A. Billard, “Locally active globally stable dynamical systems: Theory, learning, and experiments,” The International Journal of Robotics Research, vol. 41, no. 3, pp. 312–347, 2022.
  4. M. Hersch, F. Guenter, S. Calinon, and A. Billard, “Dynamical system modulation for robot learning via kinesthetic demonstrations,” IEEE Transactions on Robotics, vol. 24, no. 6, pp. 1463–1467, 2008.
  5. S. M. Khansari-Zadeh and A. Billard, “Learning stable nonlinear dynamical systems with gaussian mixture models,” IEEE Transactions on Robotics, vol. 27, no. 5, pp. 943–957, 2011.
  6. S. M. Khansari-Zadeh and A. Billard, “Learning control Lyapunov function to ensure stability of dynamical system-based robot reaching motions,” Robotics and Autonomous Systems, vol. 62, no. 6, pp. 752–765, 2014.
  7. N. Figueroa and A. Billard, “A physically-consistent bayesian non-parametric mixture model for dynamical system learning,” in 2nd Annual Conference on Robot Learning, 2018, pp. 927–946.
  8. K. Neumann and J. J. Steil, “Learning robot motions with stable dynamical systems under diffeomorphic transformations,” Robotics and Autonomous Systems, vol. 70, pp. 1–15, 2015.
  9. H. Ravichandar, I. Salehi, and A. Dani, “Learning partially contracting dynamical systems from demonstrations,” in Conference on Robot Learning.   PMLR, 2017, pp. 369–378.
  10. F. Khadivar, I. Lauzana, and A. Billard, “Learning dynamical systems with bifurcations,” Robotics and Autonomous Systems, vol. 136, p. 103700, 2021.
  11. A. Abyaneh and H.-C. Lin, “Learning Lyapunov-stable polynomial dynamical systems through imitation,” in 7th Annual Conference on Robot Learning, 2023.
  12. Y.-C. Chang, N. Roohi, and S. Gao, “Neural Lyapunov control,” Advances in neural information processing systems, vol. 32, 2019.
  13. M. A. Rana, A. Li, D. Fox, B. Boots, F. Ramos, and N. Ratliff, “Euclideanizing flows: Diffeomorphic reduction for learning stable dynamical systems,” in Learning for Dynamics and Control.   PMLR, 2020, pp. 630–639.
  14. J. Zhang, H. B. Mohammadi, and L. Rozo, “Learning riemannian stable dynamical systems via diffeomorphisms,” in 6th Annual Conference on Robot Learning, 2022.
  15. J. Ho and S. Ermon, “Generative adversarial imitation learning,” Advances in neural information processing systems, vol. 29, p. 4572–4580, 2016.
  16. J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adversarial inverse reinforcement learning,” in International Conference on Learning Representations, 2018.
  17. H. Dai, B. Landry, L. Yang, M. Pavone, and R. Tedrake, “Lyapunov-stable neural-network control,” in Robotics: Science and Systems, 2021.
  18. A. Coulombe and H.-C. Lin, “Generating stable and collision-free policies through Lyapunov function learning,” in International Conference on Robotics and Automation, 2023, pp. 3037–3043.
  19. P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in International conference on Machine learning, 2004, p. 1.
  20. B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, et al., “Maximum entropy inverse reinforcement learning.” in Association for the Advancement of Artificial Intelligence, vol. 8, 2008, pp. 1433–1438.
  21. B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in International Conference on Machine Learning.   PMLR, 2017, pp. 146–155.
  22. J. Z. Kolter and G. Manek, “Learning stable deep dynamics models,” Advances in neural information processing systems, vol. 32, 2019.
  23. S. M. Khansari-Zadeh and A. Billard, “LASA Handwriting Dataset,” https://cs.stanford.edu/people/khansari/download.html#SEDS_reference, 2011.
  24. M. Bruveris, “Optimal reparametrizations in the square root velocity framework,” SIAM Journal on Mathematical Analysis, vol. 48, no. 6, pp. 4335–4354, 2016.
  25. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
  26. D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in neural information processing systems, vol. 1, 1988, pp. 305–313.
  27. F. Torabi, G. Warnell, and P. Stone, “Behavioral cloning from observation,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 4950–4957.
  28. E. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and information systems, vol. 7, pp. 358–386, 2005.
Citations (1)

Summary

We haven't generated a summary for this paper yet.