Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Lyapunov and Optimal Control (2305.15244v4)

Published 24 May 2023 in cs.RO

Abstract: Despite impressive results, reinforcement learning (RL) suffers from slow convergence and requires a large variety of tuning strategies. In this paper, we investigate the ability of RL algorithms on simple continuous control tasks. We show that without reward and environment tuning, RL suffers from poor convergence. In turn, we introduce an optimal control (OC) theoretic learning-based method that can solve the same problems robustly with simple parsimonious costs. We use the Hamilton-Jacobi-BeLLMan (HJB) and first-order gradients to learn optimal time-varying value functions and therefore, policies. We show the relaxation of our objective results in time-varying Lyapunov functions, further verifying our approach by providing guarantees over a compact set of initial conditions. We compare our method to Soft Actor Critic (SAC) and Proximal Policy Optimisation (PPO). In this comparison, we solve all tasks, we never underperform in task cost and we show that at the point of our convergence, we outperform SAC and PPO in the best case by 4 and 2 orders of magnitude.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, p. eaau5872, 2019.
  2. O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray et al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020.
  3. B. Katz, J. D. Carlo, and S. Kim, “Mini cheetah: A platform for pushing the limits of dynamic quadruped control,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 6295–6301.
  4. Y. Tassa, T. Erez, and E. Todorov, “Synthesis and stabilization of complex behaviors through online trajectory optimization,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2012, pp. 4906–4913.
  5. P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
  6. H. J. Suh, M. Simchowitz, K. Zhang, and R. Tedrake, “Do differentiable simulators give better policy gradients?” in International Conference on Machine Learning.   PMLR, 2022, pp. 20 668–20 696.
  7. S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, and R. Tedrake, “Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,” Autonomous robots, vol. 40, pp. 429–455, 2016.
  8. I. O. Sandoval, P. Petsagkourakis, and E. A. del Rio-Chanona, “Neural odes as feedback policies for nonlinear optimal control,” arXiv preprint arXiv:2210.11245, 2022.
  9. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning.   PMLR, 2018, pp. 1861–1870.
  10. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  11. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning.   PMLR, 2015, pp. 1889–1897.
  12. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  13. B. Recht, “A tour of reinforcement learning: The view from continuous control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019.
  14. I. Mordatch, K. Lowrey, G. Andrew, Z. Popovic, and E. V. Todorov, “Interactive control of diverse complex characters with neural networks,” Advances in neural information processing systems, vol. 28, 2015.
  15. J. Wang, T. S. Lembono, S. Kim, S. Calinon, S. Vijayakumar, and S. Tonneau, “Learning to guide online multi-contact receding horizon planning,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 12 942–12 949.
  16. H. Li, R. J. Frei, and P. M. Wensing, “Model hierarchy predictive control of robotic systems,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3373–3380, 2021.
  17. M. Lutter, S. Mannor, J. Peters, D. Fox, and A. Garg, “Value iteration in continuous actions, states and time,” arXiv preprint arXiv:2105.04682, 2021.
  18. K. Lowrey, A. Rajeswaran, S. Kakade, E. Todorov, and I. Mordatch, “Plan online, learn offline: Efficient learning and exploration via model-based control,” arXiv preprint arXiv:1811.01848, 2018.
  19. C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe nonlinear control using robust neural lyapunov-barrier functions,” in Conference on Robot Learning.   PMLR, 2022, pp. 1724–1735.
  20. W. Xiao, T.-H. Wang, R. Hasani, M. Chahine, A. Amini, X. Li, and D. Rus, “Barriernet: Differentiable control barrier functions for learning of safe robot control,” IEEE Transactions on Robotics, pp. 1–19, 2023.
  21. Y.-C. Chang, N. Roohi, and S. Gao, “Neural lyapunov control,” Advances in neural information processing systems, vol. 32, 2019.
  22. W. Xiao, R. Hasani, X. Li, and D. Rus, “BarrierNet: A Safety-Guaranteed Layer for Neural Networks,” arXiv e-prints, p. arXiv:2111.11277, Nov. 2021.
  23. S. Ainsworth, K. Lowrey, J. Thickstun, Z. Harchaoui, and S. Srinivasa, “Faster policy learning with continuous-time gradients,” in Learning for Dynamics and Control.   PMLR, 2021, pp. 1054–1067.
  24. X. Zhang, J. Long, W. Hu, W. E, and J. Han, “Initial value problem enhanced sampling for closed-loop optimal control design with deep neural networks,” 2023. [Online]. Available: https://openreview.net/forum?id=oXM5kdnAUNq
  25. S. Bansal and C. J. Tomlin, “Deepreach: A deep learning approach to high-dimensional reachability,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 1817–1824.
  26. S. Engin and V. Isler, “Neural optimal control using learned system dynamics,” arXiv preprint arXiv:2302.09846, 2023.
  27. Y. D. Zhong, B. Dey, and A. Chakraborty, “Symplectic ode-net: Learning hamiltonian dynamics with control,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=ryxmb1rKDS
  28. D. H. Jacobson, “New second-order and first-order algorithms for determining optimal control: A differential dynamic programming approach,” Journal of Optimization Theory and Applications, vol. 2, pp. 411–440, 1968.
  29. R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” Advances in neural information processing systems, vol. 31, 2018.
  30. Y. Lin and E. D. Sontag, “A universal formula for stabilization with bounded controls,” Systems & control letters, vol. 16, no. 6, pp. 393–397, 1991.
  31. B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in Proceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 70.   PMLR, 2017, pp. 146–155.
  32. A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021. [Online]. Available: http://jmlr.org/papers/v22/20-1364.html
  33. E. Todorov and W. Li, “Optimal control methods suitable for biomechanical systems,” in Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439), vol. 2, 2003, pp. 1758–1761 Vol.2.
  34. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016.
  35. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

Summary

We haven't generated a summary for this paper yet.