The Feasibility of Constrained Reinforcement Learning Algorithms: A Tutorial Study (2404.10064v1)
Abstract: Satisfying safety constraints is a priority concern when solving optimal control problems (OCPs). Due to the existence of infeasibility phenomenon, where a constraint-satisfying solution cannot be found, it is necessary to identify a feasible region before implementing a policy. Existing feasibility theories built for model predictive control (MPC) only consider the feasibility of optimal policy. However, reinforcement learning (RL), as another important control method, solves the optimal policy in an iterative manner, which comes with a series of non-optimal intermediate policies. Feasibility analysis of these non-optimal policies is also necessary for iteratively improving constraint satisfaction; but that is not available under existing MPC feasibility theories. This paper proposes a feasibility theory that applies to both MPC and RL by filling in the missing part of feasibility analysis for an arbitrary policy. The basis of our theory is to decouple policy solving and implementation into two temporal domains: virtual-time domain and real-time domain. This allows us to separately define initial and endless, state and policy feasibility, and their corresponding feasible regions. Based on these definitions, we analyze the containment relationships between different feasible regions, which enables us to describe the feasible region of an arbitrary policy. We further provide virtual-time constraint design rules along with a practical design tool called feasibility function that helps to achieve the maximum feasible region. We review most of existing constraint formulations and point out that they are essentially applications of feasibility functions in different forms. We demonstrate our feasibility theory by visualizing different feasible regions under both MPC and RL policies in an emergency braking control task.
- Constrained policy optimization, in: International Conference on Machine Learning, PMLR. pp. 22–31.
- Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation., in: Robotics: Science and Systems, Cambridge, MA, USA. pp. 1–10.
- Constrained Markov decision processes. volume 7. CRC press.
- Control barrier functions: Theory and applications, in: 2019 18th European control conference (ECC), IEEE. pp. 3420–3431.
- Control barrier function based quadratic programs with application to adaptive cruise control, in: 53rd IEEE Conference on Decision and Control, IEEE. pp. 6271–6278.
- Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control 62, 3861–3876.
- Constrained policy optimization via bayesian world models, in: International Conference on Learning Representations.
- Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3682–3689.
- Hamilton-jacobi reachability: A brief overview and recent advances, in: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), IEEE. pp. 2242–2253.
- Deepreach: A deep learning approach to high-dimensional reachability, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE. pp. 1817–1824.
- Stability and feasibility of state constrained mpc without stabilizing terminal constraints. Systems & control letters 72, 14–21.
- Predictive control for linear and hybrid systems. Cambridge University Press.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems 5, 411–444.
- End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks, in: Proceedings of the AAAI conference on artificial intelligence, pp. 3387–3395.
- Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research 18, 6070–6120.
- A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems 31.
- Natural policy gradient primal-dual method for constrained markov decision processes. Advances in Neural Information Processing Systems 33, 8378–8390.
- Bridging hamilton-jacobi safety analysis and reinforcement learning, in: 2019 International Conference on Robotics and Automation (ICRA), IEEE. pp. 8550–8556.
- Controlled invariant feasibility—a general approach to enforcing strong feasibility in mpc applied to move-blocking. Automatica 45, 2869–2875.
- Mpc of constrained discrete-time linear periodic systems—a framework for asynchronous control: Strong feasibility, stability and optimality via periodic invariance. Automatica 47, 326–333.
- Integrated decision and control: Toward interpretable and computationally efficient driving intelligence. IEEE transactions on cybernetics 53, 859–873.
- Scalable learning of safety guarantees for autonomous systems using hamilton-jacobi reachability, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE. pp. 5914–5920.
- Reinforcement learning for sequential decision and optimal control. Springer.
- Control in a safe set: Addressing safety in human-robot interactions, in: Dynamic Systems and Control Conference, American Society of Mechanical Engineers. p. V003T42A003.
- Ipo: Interior-point policy optimization under constraints, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4940–4947.
- Constrained variational policy optimization for safe reinforcement learning, in: International Conference on Machine Learning, PMLR. pp. 13644–13668.
- Oops! i cannot do it again: Testing for recursive feasibility in mpc. Automatica 48, 550–555.
- Model-based constrained reinforcement learning using generalized control barrier function, in: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE. pp. 4552–4559.
- Joint synthesis of safety certificate and safe control policy using constrained reinforcement learning, in: Learning for Dynamics and Control Conference, PMLR. pp. 97–109.
- A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games. IEEE Transactions on automatic control 50, 947–957.
- Exponential control barrier functions for enforcing high relative-degree safety-critical constraints, in: 2016 American Control Conference (ACC), IEEE. pp. 322–328.
- Barrier-certified adaptive reinforcement learning with applications to brushbot navigation. IEEE Transactions on robotics 35, 1186–1205.
- Model-based chance-constrained reinforcement learning via separated proportional-integral lagrangian. IEEE Transactions on Neural Networks and Learning Systems .
- Learning safe multi-agent control with decentralized neural barrier certificates, in: International Conference on Learning Representations.
- Safe reinforcement learning benchmark environments for aerospace control systems, in: 2022 IEEE Aerospace Conference (AERO), IEEE. pp. 1–20.
- Learning control barrier functions from expert demonstrations, in: 2020 59th IEEE Conference on Decision and Control (CDC), IEEE. pp. 3717–3724.
- A classification-based approach for approximate reachability, in: 2019 International Conference on Robotics and Automation (ICRA), IEEE. pp. 7697–7704.
- Robust trajectory planning for a multirotor against disturbance based on hamilton-jacobi reachability analysis, in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE. pp. 3150–3157.
- Responsive safety in reinforcement learning by pid lagrangian methods, in: International Conference on Machine Learning, PMLR. pp. 9133–9143.
- Learning for safety-critical control with control barrier functions, in: Learning for Dynamics and Control, PMLR. pp. 708–717.
- Recovery rl: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters 6, 4915–4922.
- Persistently feasible robust safe control by safety index synthesis and convex semi-infinite programming. IEEE Control Systems Letters 7, 1213–1218.
- Control barrier functions for systems with high relative degree, in: 2019 IEEE 58th conference on decision and control (CDC), IEEE. pp. 474–479.
- Projection-based constrained policy optimization, in: International Conference on Learning Representations.
- Model-free safe reinforcement learning through neural barrier certificate. IEEE Robotics and Automation Letters 8, 1295–1302.
- Feasible policy iteration. arXiv preprint arXiv:2304.08845 .
- A dual approach to constrained markov decision processes with entropy regularization, in: International Conference on Artificial Intelligence and Statistics, PMLR. pp. 1887–1909.
- Reachability constrained reinforcement learning, in: International Conference on Machine Learning, PMLR. pp. 25636–25655.
- Towards safe reinforcement learning with a safety editor policy. Advances in Neural Information Processing Systems 35, 2608–2621.
- Switched model predictive control of switched linear systems: Feasibility, stability and robustness. Automatica 67, 8–21.
- First order constrained optimization in policy space. Advances in Neural Information Processing Systems 33, 15338–15349.
- State-wise constrained policy optimization. arXiv preprint arXiv:2306.12594 .
- Model-free safe control for zero-violation reinforcement learning, in: 5th Annual Conference on Robot Learning.
- Probabilistic safeguard for reinforcement learning using safety index guided gaussian process models, in: Learning for Dynamics and Control Conference, PMLR. pp. 783–796.
- Safety index synthesis via sum-of-squares programming, in: 2023 American Control Conference (ACC), IEEE. pp. 732–737.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.