POLICEd RL: Learning Closed-Loop Robot Control Policies with Provable Satisfaction of Hard Constraints (2403.13297v3)
Abstract: In this paper, we seek to learn a robot policy guaranteed to satisfy state constraints. To encourage constraint satisfaction, existing RL algorithms typically rely on Constrained Markov Decision Processes and discourage constraint violations through reward shaping. However, such soft constraints cannot offer verifiable safety guarantees. To address this gap, we propose POLICEd RL, a novel RL algorithm explicitly designed to enforce affine hard constraints in closed-loop with a black-box environment. Our key insight is to force the learned policy to be affine around the unsafe set and use this affine region as a repulsive buffer to prevent trajectories from violating the constraint. We prove that such policies exist and guarantee constraint satisfaction. Our proposed framework is applicable to both systems with continuous and discrete state and action spaces and is agnostic to the choice of the RL training algorithm. Our results demonstrate the capacity of POLICEd RL to enforce hard constraints in robotic tasks while significantly outperforming existing methods.
- Constrained policy optimization. In International Conference on Machine Learning, pages 22 – 31. PMLR, 2017.
- Differentiable convex optimization layers. Advances in Neural Information Processing Systems, 32, 2019.
- Eitan Altman. Constrained Markov decision processes. Routledge, 2021.
- Control barrier functions: Theory and applications. In 18th European Control Conference (ECC), pages 3420–3431. IEEE, 2019.
- OptNet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning, pages 136–145. PMLR, 2017.
- A spline theory of deep learning. In 35th International Conference on Machine Learning, volume 80, pages 374–383. PMLR, 2018.
- POLICE: Provably optimal linear constraint enforcement for deep neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
- OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411 – 444, 2022.
- Neural Lyapunov control. Advances in Neural Information Processing Systems, 32, 2019.
- End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3387–3395, 2019.
- Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
- Robust safe control synthesis with disturbance observer-based control barrier functions. In 61st Conference on Decision and Control (CDC), pages 5566–5573. IEEE, 2022.
- DC3: A learning method for optimization with hard constraints. In International Conference on Learning Representations, 2020.
- Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis. Machine Learning, 110(9):2419 – 2468, 2021.
- Homogeneous linear inequality constraints for neural network activations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 748 – 749, 2020.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, pages 1587 – 1596. PMLR, 2018.
- δ𝛿\deltaitalic_δ-complete decision procedures for satisfiability over the reals. In International Joint Conference on Automated Reasoning, pages 286–300. Springer, 2012.
- Gene Golub. Numerical methods for solving linear least squares problems. Numerische Mathematik, 7:206–216, 1965.
- Convex Polytopes. Springer Science & Business Media, 2003.
- A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
- A sample-efficient algorithm for episodic finite-horizon MDP with constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8030–8037, 2021.
- Anthony W Knapp. Basic Real Analysis. Springer Science & Business Media, 2005.
- Planning with learned dynamics: Probabilistic guarantees on safety and reachability via Lipschitz constants. IEEE Robotics and Automation Letters, 6(3):5129–5136, 2021.
- Next generation airborne collision avoidance system. Lincoln Laboratory Journal, 19(1):17–33, 2012.
- A new computationally simple approach for implementing neural networks with output hard constraints. arXiv preprint arXiv:2307.10459, 2023.
- Continuous control with deep reinforcement learning. In International Conference on Learning Representations, 2016.
- Constrained decision transformer for offline safe reinforcement learning. arXiv preprint arXiv:2302.07351, 2023.
- Lennart Ljung. System identification. In Signal analysis and prediction, pages 163–173. Springer, 1998.
- Model-based constrained reinforcement learning using generalized control barrier function. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4552–4559. IEEE, 2021.
- Joint synthesis of safety certificate and safe control policy using constrained reinforcement learning. In Learning for Dynamics and Control Conference, pages 97–109. PMLR, 2022.
- ConBaT: Control barrier transformer for safe policy learning. arXiv preprint arXiv:2303.04212, 2023.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Transactions on Intelligent Transportation Systems, 22(7):4316–4336, 2020.
- Exponential control barrier functions for enforcing high relative-degree safety-critical constraints. In 2016 American Control Conference (ACC), pages 322 – 328. IEEE, 2016.
- Optlayer - practical constrained optimization for deep reinforcement learning in the real world. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 6236–6243. IEEE, 2018.
- Backward reachability analysis of neural feedback loops: Techniques for linear and nonlinear systems. IEEE Open Journal of Control Systems, 2:108–124, 2023.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140–1144, 2018.
- Reinforcement learning: An introduction. MIT press, 2018.
- MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
- RAYEN: Imposition of hard convex constraints on neural networks. arXiv preprint arXiv:2307.08336, 2023.
- Reachable polyhedral marching (RPM): A safety verification algorithm for robotic systems with deep neural network components. In IEEE International Conference on Robotics and Automation (ICRA), pages 9029–9035. IEEE, 2021.
- Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pages 36593–36604. PMLR, 2023.
- Towards fast computation of certified robustness for ReLU networks. In International Conference on Machine Learning, pages 5276–5285. PMLR, 2018.
- Feedback linearization for uncertain systems via reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 1364 – 1371. IEEE, 2020.
- Control barrier functions for systems with high relative degree. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 474 – 479. IEEE, 2019.
- BarrierNet: Differentiable control barrier functions for learning of safe robot control. IEEE Transactions on Robotics, 39(3):2289 – 2307, 2023.
- Model-free safe reinforcement learning through neural barrier certificate. IEEE Robotics and Automation Letters, 8(3):1295–1302, 2023.
- Efficient neural network robustness certification with general activation functions. Advances in Neural Information Processing Systems, 31, 2018.
- RecurJac: An efficient recursive algorithm for bounding Jacobian matrix of neural networks and its applications. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 5757–5764, 2019.
- Model-free safe control for zero-violation reinforcement learning. In 5th Annual Conference on Robot Learning, pages 784–793. PMLR, 2022.
- State-wise constrained policy optimization. arXiv preprint arXiv:2306.12594, 2023a.
- Probabilistic safeguard for reinforcement learning using safety index guided Gaussian process models. In Learning for Dynamics and Control Conference, pages 783–796. PMLR, 2023b.
- Learn with imagination: Safe set guided state-wise constrained policy optimization. arXiv preprint arXiv:2308.13140, 2023c.
- Jean-Baptiste Bouvier (14 papers)
- Kartik Nagpal (6 papers)
- Negar Mehr (36 papers)