Resilient Constrained Reinforcement Learning (2312.17194v2)
Abstract: We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making. To tackle this issue, we propose a new constrained RL approach that searches for policy and constraint specifications together. This method features the adaptation of relaxing the constraint according to a relaxation cost introduced in the learning objective. Since this feature mimics how ecological systems adapt to disruptions by altering operation, our approach is termed as resilient constrained RL. Specifically, we provide a set of sufficient conditions that balance the constraint satisfaction and the reward maximization in notion of resilient equilibrium, propose a tractable formulation of resilient constrained policy optimization that takes this equilibrium as an optimal solution, and advocate two resilient constrained policy search algorithms with non-asymptotic convergence guarantees on the optimality gap and constraint satisfaction. Furthermore, we demonstrate the merits and the effectiveness of our approach in computational experiments.
- On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506.
- Altman, E. (1999). Constrained Markov decision processes, volume 7. CRC press.
- Stability and singular perturbations in constrained Markov decision problems. IEEE transactions on automatic control, 38(6):971–975.
- Sensitivity of constrained Markov decision processes. Annals of Operations Research, 32(1):1–22.
- Bertsekas, D. (2016). Nonlinear Programming, volume 4. Athena Scientific.
- On the linear convergence of policy gradient methods for finite MDPs. In International Conference on Artificial Intelligence and Statistics, pages 2386–2394.
- Budget allocation using weakly coupled, constrained markov decision processes. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, pages 52–61.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444.
- State augmented constrained reinforcement learning: Overcoming the limitations of learning with rewards. IEEE Transactions on Automatic Control.
- Resilient control: Compromising to adapt. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 5703–5710. IEEE.
- Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1):6070–6120.
- Constrained multiagent Markov decision processes: A taxonomy of problems and algorithms. Journal of Artificial Intelligence Research, 70:955–1001.
- Flexible budgets in restless bandits: a primal-dual algorithm for efficient budget allocation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12103–12111.
- Policy gradient primal-dual mirror descent for constrained MDPs with large state spaces. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 4892–4897. IEEE.
- Last-iterate convergent policy gradient primal-dual methods for constrained MDPs. arXiv preprint arXiv:2306.11700.
- Natural policy gradient primal-dual method for constrained Markov decision processes. Advances in Neural Information Processing Systems, 33:8378–8390.
- Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs. arXiv preprint arXiv:2206.02346.
- Exploration-exploitation in constrained MDPs. arXiv preprint arXiv:2003.02189.
- Towards safe human-robot collaboration using deep reinforcement learning. In 2020 IEEE international conference on robotics and automation (ICRA), pages 4899–4905.
- Reinforcement learning with unknown reward functions. In Task-Agnostic Reinforcement Learning Workshop at ICLR 2019.
- Resilient reinforcement learning and robust output regulation under denial-of-service attacks. Automatica, 142:110366.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480.
- A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330.
- Holling, C. S. (1973). Resilience and stability of ecological systems. Annual review of ecology and systematics, 4(1):1–23.
- Holling, C. S. (1996). Engineering resilience versus ecological resilience. Engineering within ecological constraints, 31(1996):32.
- Resilient constrained learning. arXiv preprint arXiv:2306.02426.
- Reinforcement learning for feedback-enabled cyber resilience. Annual reviews in control, 53:273–295.
- Sim2real transfer for reinforcement learning without dynamics randomization. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4383–4388.
- Reward uncertainty for exploration in preference-based reinforcement learning. arXiv preprint arXiv:2205.12401.
- Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3):385–398.
- Constrained decision transformer for offline safe reinforcement learning. arXiv preprint arXiv:2302.07351.
- Safe policies for reinforcement learning via primal-dual methods. IEEE Transactions on Automatic Control.
- Constrained reinforcement learning has zero duality gap. Advances in Neural Information Processing Systems, 32.
- Robot navigation in constrained pedestrian environments using reinforcement learning. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1140–1146.
- Resilient multi-agent reinforcement learning with adversarial value decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11308–11316.
- Popov, L. D. (1980). A modification of the Arrow-Hurwicz method for search of saddle points. Mathematical notes of the Academy of Sciences of the USSR, 28:845–848.
- Online learning with predictable sequences. In Conference on Learning Theory, pages 993–1019. PMLR.
- Optimization for Modern Data Analysis. Preprint available at http://eecs.berkeley.edu/~brecht/opt4mlbook.
- Direct behavior specification via constrained reinforcement learning. In International Conference on Machine Learning, pages 18828–18843.
- Shelton, C. (2000). Balancing multiple sources of reward in reinforcement learning. Advances in Neural Information Processing Systems, 13.
- Szepesvári, C. (2020). Constrained MDPs and the reward hypothesis. http://readingsml.blogspot.com/2020/03/constrained-mdps-and-reward-hypothesis.html. Access on October 11, 2023.
- Welfare maximization algorithm for solving budget-constrained multi-component POMDPs. IEEE Control Systems Letters.
- Linear last-iterate convergence in constrained saddle-point optimization. In International Conference on Learning Representations.
- Xiao, L. (2022). On the convergence rates of policy gradient methods. The Journal of Machine Learning Research, 23(1):12887–12922.
- Causal inference Q-network: Toward resilient reinforcement learning. In Self-Supervision for Reinforcement Learning Workshop-ICLR 2021.
- Reward is enough for convex MDPs. Advances in Neural Information Processing Systems, 34:25746–25759.
- Cautious adaptation for reinforcement learning in safety-critical settings. In International Conference on Machine Learning, pages 11055–11065.
- Saformer: A conditional sequence modeling approach to offline safe reinforcement learning. arXiv preprint arXiv:2301.12203.
- Dongsheng Ding (12 papers)
- Zhengyan Huan (3 papers)
- Alejandro Ribeiro (281 papers)