Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning (2405.02754v1)
Abstract: Deep reinforcement learning (DRL) has demonstrated remarkable performance in many continuous control tasks. However, a significant obstacle to the real-world application of DRL is the lack of safety guarantees. Although DRL agents can satisfy system safety in expectation through reward shaping, designing agents to consistently meet hard constraints (e.g., safety specifications) at every time step remains a formidable challenge. In contrast, existing work in the field of safe control provides guarantees on persistent satisfaction of hard safety constraints. However, these methods require explicit analytical system dynamics models to synthesize safe control, which are typically inaccessible in DRL settings. In this paper, we present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents that ensure provable safety throughout training. The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function (e.g., a digital twin simulator). Moreover, we theoretically prove that the implicit safe set algorithm guarantees finite time convergence to the safe set and forward invariance for both continuous-time and discrete-time systems. We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95\% \pm 9\%$ cumulative reward compared to state-of-the-art safe DRL methods. Furthermore, the resulting algorithm scales well to high-dimensional systems with parallel computing.
- Constrained policy optimization. In International Conference on Machine Learning, pp. 22–31. PMLR.
- Control barrier function based quadratic programs with application to adaptive cruise control. In 53rd IEEE Conference on Decision and Control, pp. 6271–6278. IEEE.
- Armijo, L. (1966). Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of mathematics, 16(1), 1–3.
- Safe model-based reinforcement learning with stability guarantees. Advances in Neural Information Processing Systems, 30.
- Safety index synthesis with state-dependent control space. arXiv preprint arXiv:2309.12406, 12.
- Real-time safety index adaptation for parameter-varying systems via determinant gradient ascend. arXiv preprint arXiv:2403.14968, 12.
- End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 3387–3395.
- Human motion prediction using semi-adaptable neural networks. In 2019 American Control Conference (ACC), pp. 4884–4890. IEEE.
- Lyapunov-based safe policy optimization for continuous control. ICML 2019 Workshop RL4RealLife, abs/1901.10031.
- Safe exploration in continuous action spaces. CoRR, abs/1801.08757.
- Shieldnn: A provably safe nn filter for unsafe nn controllers. CoRR, abs/2006.09564.
- A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control, 64(7), 2737–2752.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1), 1437–1480.
- Reactive sliding-mode algorithm for collision avoidance in robotic systems. IEEE Transactions on Control Systems Technology, 21(6), 2391–2399.
- A hierarchical long short term safety framework for efficient robot manipulation under uncertainty. Robotics and Computer-Integrated Manufacturing, 82, 102522.
- Autocost: Evolving intrinsic cost for zero-violation reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, pp. 14847–14855.
- Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. In Autonomous robot vehicles, pp. 396–404. Springer.
- Kinematic and dynamic vehicle models for autonomous driving control design. In 2015 IEEE Intelligent Vehicles Symposium (IV), pp. 1094–1099. IEEE.
- Learning predictive safety filter via decomposition of robust invariant set. arXiv preprint arXiv:2311.06769, 12.
- Control in a safe set: Addressing safety in human-robot interactions. In ASME 2014 Dynamic Systems and Control Conference. American Society of Mechanical Engineers Digital Collection.
- Safe exploration: Addressing various uncertainty levels in human robot interactions. In 2015 American Control Conference (ACC), pp. 465–470. IEEE.
- Review of digital twin about concepts, technologies, and industrial applications. Journal of Manufacturing Systems, 58, 346–361.
- Safe adaptation with multiplicative uncertainties using robust safe set algorithm. IFAC-PapersOnLine, 54(20), 360–365.
- Safe adaptation with multiplicative uncertainties using robust safe set algorithm. In Modeling, Estimation and Control Conference.
- Smoothing policies and safe policy gradients. CoRR, abs/1905.03231.
- Optlayer-practical constrained optimization for deep reinforcement learning in the real world. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6236–6243. IEEE.
- Safe policy iteration. In International Conference on Machine Learning, pp. 307–315. PMLR.
- Benchmarking safe exploration in deep reinforcement learning. CoRR, abs/1910.01708.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347.
- Hybrid task constrained planner for robot manipulator in confined environment. arXiv preprint arXiv:2304.09260, 12.
- Adaptive safety with control barrier functions. In 2020 American Control Conference (ACC), pp. 1399–1405. IEEE.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE.
- Persistently feasible robust safe control by safety index synthesis and convex semi-infinite programming. IEEE Control Systems Letters, 7, 1213–1218.
- Bio-inspired genetic algorithms with formalized crossover operators for robotic applications. Frontiers in neurorobotics, 11, 56.
- Learning to build high-fidelity and robust environment models. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 104–121. Springer.
- Stochastic variance reduction for deep q-learning. arXiv preprint arXiv:1905.08152, 54.
- Guard: A safe reinforcement learning benchmark. arXiv preprint arXiv:2305.13681, 12.
- State-wise constrained policy optimization..
- Provably safe tolerance estimation for robot arms via sum-of-squares programming. IEEE Control Systems Letters, 6, 3439–3444.
- Contact-rich trajectory generation in confined environments using iterative convex optimization. arXiv preprint arXiv:2008.03826, 54.
- State-wise safe reinforcement learning: A survey. arXiv preprint arXiv:2302.03122, 54.
- Model-free safe control for zero-violation reinforcement learning. In 5th Annual Conference on Robot Learning.
- Probabilistic safeguard for reinforcement learning using safety index guided gaussian process models. In Learning for Dynamics and Control Conference, pp. 783–796. PMLR.
- Safety index synthesis via sum-of-squares programming. In 2023 American Control Conference (ACC), pp. 732–737. IEEE.
- Absolute policy optimization. arXiv preprint arXiv:2310.13230, 12.
- Approximation gradient error variance reduced optimization. In Workshop on Reinforcement Learning in Games (RLG) at The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).
- Experimental evaluation of human motion prediction toward safe and efficient human robot collaboration. In 2020 American Control Conference (ACC), pp. 4349–4354. IEEE.
- Learn with imagination: Safe set guided state-wise constrained policy optimization. arXiv preprint arXiv:2308.13140, 12.
- Weiye Zhao (24 papers)
- Tairan He (22 papers)
- Feihan Li (6 papers)
- Changliu Liu (134 papers)