Iterative Reachability Estimation for Safe Reinforcement Learning (2309.13528v1)
Abstract: Ensuring safety is important for the practical deployment of reinforcement learning (RL). Various challenges must be addressed, such as handling stochasticity in the environments, providing rigorous guarantees of persistent state-wise safety satisfaction, and avoiding overly conservative behaviors that sacrifice performance. We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained RL in general stochastic settings. In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety. Outside this feasible set, our optimization produces the safest behavior by guaranteeing entrance into the feasible set whenever possible with the least cumulative discounted violations. We introduce a class of algorithms using our novel reachability estimation function to optimize in our proposed framework and in similar frameworks such as those concurrently handling multiple hard and soft constraints. We theoretically establish that our algorithms almost surely converge to locally optimal policies of our safe optimization framework. We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo, and show the benefits in improving both reward performance and safety compared with state-of-the-art baselines.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
- A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
- Projection-based constrained policy optimization. In International Conference on Learning Representations, 2020.
- First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33:15338–15349, 2020.
- Cup: A conservative update policy algorithm for safe reinforcement learning. arXiv preprint arXiv:2202.07565, 2022.
- Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017.
- Reward constrained policy optimization. In International Conference on Learning Representations, 2019.
- Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1):6070–6120, 2017.
- Feasible actor-critic: Constrained reinforcement learning for ensuring statewise safety. arXiv preprint arXiv:2105.10682, 2021.
- Vivek S Borkar. Stochastic approximation: a dynamical systems viewpoint, volume 48. Springer, 2009.
- Control barrier functions: Theory and applications. In European Control Conf., 2019.
- Reach-avoid problems with time-varying dynamics, targets and constraints. In Hybrid Systems: Computation and Control. ACM, 2015.
- Hamilton-Jacobi reachability: A brief overview and recent advances. In Conf. on Decision and Control, 2017.
- Rapidly exponentially stabilizing control Lyapunov functions and hybrid zero dynamics. Trans. on Automatic Control, 2014.
- Robustness of control barrier functions for safety critical control. Int. Federation of Automatic Control, 2015.
- Neural lyapunov control. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Stabilizing neural control using self-learned almost lyapunov critics. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1803–1809, 2021.
- Quantifying safety of learning-based self-driving control using almost-barrier functions. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 12903–12910. IEEE, 2022.
- Learning stabilization control from observations by learning lyapunov-like proxy models. 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.
- Model-free safe reinforcement learning through neural barrier certificate. IEEE Robotics and Automation Letters, 2023.
- Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pages 36593–36604. PMLR, 2023.
- Model-free safety-critical control for robotic systems. IEEE robotics and automation letters, 7(2):944–951, 2021.
- A minimum discounted reward hamilton-jacobi formulation for computing reachable sets. arXiv preprint arXiv:1809.00706, 2018.
- Bridging hamilton-jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pages 8550–8556. IEEE, 2019.
- Control barrier function based quadratic programs for safety critical systems. Trans. on Automatic Control, 2017.
- Nuoya Xiong et al. Provably safe reinforcement learning with step-wise violation constraints. arXiv preprint arXiv:2302.06064, 2023.
- Reachability constrained reinforcement learning. In International Conference on Machine Learning, pages 25636–25655. PMLR, 2022.
- Eitan Altman. Constrained Markov decision processes: stochastic modeling. Routledge, 1999.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022.
- Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 7(1):2, 2019.
- Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints. Neurocomputing, 484:128–141, 2022.
- Penalized proximal policy optimization for safe reinforcement learning. In International Joint Conference on Artificial Intelligence, 2022.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- Crpo: A new approach for safe reinforcement learning with convergence guarantee. In International Conference on Machine Learning, pages 11480–11491. PMLR, 2021.
- Density constrained reinforcement learning. In International Conference on Machine Learning, pages 8682–8692. PMLR, 2021.
- Decomposition of reachable sets and tubes for a class of nonlinear systems. Trans. on Automatic Control, 2018.
- Reachability-based safety guarantees using efficient initializations. In Conf. on Decision and Control, 2019.
- A simple and efficient sampling-based algorithm for general reachability analysis. In Learning for Dynamics & Control Conference, March 2022.
- Deepreach: A deep learning approach to high-dimensional reachability. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1817–1824. IEEE, 2021.
- Safe model-based reinforcement learning with an uncertainty-aware reachability certificate. arXiv preprint arXiv:2210.07553, 2022.
- Provably safe reinforcement learning via action projection using reachability analysis and polynomial zonotopes. IEEE Open Journal of Control Systems, 2:79–92, 2023.
- Constrained reinforcement learning for vehicle motion planning with topological reachability analysis. Robotics, 11(4):81, 2022.
- Safe reinforcement learning using black-box reachability analysis. IEEE Robotics and Automation Letters, 7(4):10665–10672, 2022.
- Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica, 44(11):2724–2734, 2008.
- Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem. Automatica, 46(12):1951–1961, 2010.
- On the computational complexity and generalization properties of multi-stage and stage-wise coupled scenario programs. Systems & Control Letters, 94:63–69, 2016.
- Stochastic reachability of a target tube: Theory and computation. Automatica, 125:109458, 2021.
- Risk-sensitive safety analysis using conditional value-at-risk. IEEE Transactions on Automatic Control, 67(12):6521–6536, 2021.
- Sven Gronauer. Bullet-safety-gym: A framework for constrained reinforcement learning. 2022.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012.
- Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA, 2018.
- Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031, 2019.
- Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society, 48(3):334–334, 1997.
- H.K. Khalil. Nonlinear Systems. Pearson Education. Prentice Hall, 2002.
- Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control. IEEE Transactions on Robotics, 2023.
- Robust control barrier–value functions for safety-critical control. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 6814–6821. IEEE, 2021.
- Policy optimization with advantage regularization for long-term fairness in decision systems, 2022.
- Ppo lagrangian pytorch. https://github.com/akjayant/PPO_Lagrangian_PyTorch, 2022.
- Omnisafe: An infrastructure for accelerating safe reinforcement learning research. arXiv preprint arXiv:2305.09304, 2023.
- Milan Ganai (8 papers)
- Zheng Gong (69 papers)
- Chenning Yu (9 papers)
- Sylvia Herbert (33 papers)
- Sicun Gao (54 papers)