Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory (2306.04026v4)

Published 6 Jun 2023 in cs.LG, cs.AI, and cs.RO

Abstract: Guaranteeing safe behaviour of reinforcement learning (RL) policies poses significant challenges for safety-critical applications, despite RL's generality and scalability. To address this, we propose a new approach to apply verification methods from control theory to learned value functions. By analyzing task structures for safety preservation, we formalize original theorems that establish links between value functions and control barrier functions. Further, we propose novel metrics for verifying value functions in safe control tasks and practical implementation details to improve learning. Our work presents a novel method for certificate learning, which unlocks a diversity of verification techniques from control theory for RL policies, and marks a significant step towards a formal framework for the general, scalable, and verifiable design of RL-based control systems. Code and videos are available at this https url: https://rl-cbf.github.io/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Reinforcement Learning: An Introduction. A Bradford Book, 2018. ISBN 0262039249.
  2. Playing atari with deep reinforcement learning, 2013.
  3. Continuous control with deep reinforcement learning. 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, 9 2015. URL https://arxiv.org/abs/1509.02971v6.
  4. Highly accurate protein structure prediction with alphafold. Nature 2021 596:7873, 596:583–589, 7 2021. ISSN 1476-4687. doi:10.1038/s41586-021-03819-2. URL https://www.nature.com/articles/s41586-021-03819-2.
  5. Explaining and harnessing adversarial examples. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 12 2014. URL https://arxiv.org/abs/1412.6572v3.
  6. E. Altman. Constrained Markov Decision Processes. Routledge, 1999.
  7. Learning safe, generalizable perception-based hybrid control with certificates. IEEE Robotics and Automation Letters, 7:1904–1911, 4 2022. ISSN 23773766. doi:10.1109/LRA.2022.3141657. URL https://arxiv.org/abs/2201.00932v1.
  8. A. Isidori. Nonlinear control systems: an introduction. Springer, 1985.
  9. G. B. Margolis and P. Agrawal. Walk these ways: Tuning robot control for generalization with multiplicity of behavior, 9 2022.
  10. Safe reinforcement learning by imagining the near future, 2022.
  11. Openai gym, 2016.
  12. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research, 23:1–18, 2022. URL http://jmlr.org/papers/v23/21-1342.html.
  13. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012. doi:10.1109/IROS.2012.6386109.
  14. D4rl: Datasets for deep data-driven reinforcement learning, 2021.
  15. S. Fujimoto and S. S. Gu. A minimalist approach to offline reinforcement learning, 2021.
  16. Safe value functions. IEEE Transactions on Automatic Control, 68(5):2743–2757, may 2023. doi:10.1109/tac.2022.3200948. URL https://doi.org/10.1109%2Ftac.2022.3200948.
  17. S. Huh and I. Yang. Safe reinforcement learning for probabilistic reachability and safety specifications: A lyapunov-based approach, 2020.
  18. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, pages 3387–3395, 2019. ISSN 2159-5399. doi:10.1609/AAAI.V33I01.33013387. URL https://dl.acm.org/doi/10.1609/aaai.v33i01.33013387.
  19. Combining model-based design and model-free policy optimization to learn safe, stabilizing controllers. IFAC-PapersOnLine, 54:19–24, 2021. ISSN 2405-8963. doi:https://doi.org/10.1016/j.ifacol.2021.08.468. URL https://www.sciencedirect.com/science/article/pii/S240589632101243X. 7th IFAC Conference on Analysis and Design of Hybrid Systems ADHS 2021.
  20. Lyapunov design for robust and efficient robotic reinforcement learning, 2022.
  21. Feasible policy iteration, 2023.
  22. Robot reinforcement learning on the constraint manifold. In A. Faust, D. Hsu, and G. Neumann, editors, Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, pages 1357–1366. PMLR, 08–11 Nov 2022. URL https://proceedings.mlr.press/v164/liu22c.html.
  23. Safe exploration in continuous action spaces, 2018.
  24. Safe control under input limits with neural control barrier functions, 2022.
  25. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5(1):411–444, 2022. doi:10.1146/annurev-control-042920-020211. URL https://doi.org/10.1146/annurev-control-042920-020211.
  26. Formal synthesis of lyapunov neural networks. IEEE Control Systems Letters, 5:773–778, 7 2021. doi:10.1109/lcsys.2020.3005328. URL https://doi.org/10.1109%2Flcsys.2020.3005328.
  27. The lyapunov neural network: Adaptive stability certification for safe learning of dynamical systems, 2018.
  28. G. Manek and J. Z. Kolter. Learning stable deep dynamics models. Advances in Neural Information Processing Systems, 32, 1 2020. ISSN 10495258. URL https://arxiv.org/abs/2001.06116v1.
  29. Lyapunov-net: A deep neural network architecture for lyapunov function approximation. Proceedings of the IEEE Conference on Decision and Control, 2022-December:2091–2096, 9 2021. ISSN 25762370. doi:10.1109/CDC51059.2022.9993006. URL https://arxiv.org/abs/2109.13359v2.
  30. Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods, 2022.
  31. Discrete-time control design with positive semi-definite lyapunov functions. Systems & Control Letters, 43:287–292, 7 2001. ISSN 0167-6911. doi:10.1016/S0167-6911(01)00110-4.
  32. Counter-example guided synthesis of neural network lyapunov functions for piecewise linear systems. Proceedings of the IEEE Conference on Decision and Control, 2020-December:1274–1281, 12 2020. ISSN 25762370. doi:10.1109/CDC42340.2020.9304201.
  33. Sablas: Learning safe control for black-box dynamical systems. IEEE Robotics and Automation Letters, 7:1928–1935, 1 2022. ISSN 23773766. doi:10.1109/LRA.2022.3142743. URL https://arxiv.org/abs/2201.01918v2.
  34. Reward constrained policy optimization, 2018.
  35. Responsive safety in reinforcement learning by pid lagrangian methods. JMLR.org, 2020.
  36. A lyapunov-based approach to safe reinforcement learning, 2018.
  37. Constrained policy optimization, 2017.
  38. Safe continuous control with constrained model-based policy optimization, 2021.
  39. Accelerating safe reinforcement learning with constraint-mismatched policies, 5 2020.
  40. Conservative safety critics for exploration, 2021.
  41. Neuro-symbolic verification of deep neural networks. IJCAI International Joint Conference on Artificial Intelligence, pages 3622–3628, 3 2022. ISSN 10450823. doi:10.24963/ijcai.2022/503. URL https://arxiv.org/abs/2203.00938v1. Verification of network behaviour by expressing desired behaviour as a logical formula and trying to find counter-examples using satisfiability solvers.
  42. A. Albarghouthi. Introduction to neural network verification. Foundations and Trends in Programming Languages, 7:1–164, 9 2021. ISSN 23251131. doi:10.1561/2500000051. URL https://arxiv.org/abs/2109.10317v2.
  43. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17:1–40, 2016.
Citations (2)

Summary

We haven't generated a summary for this paper yet.