A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning (2306.07818v2)
Abstract: Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA can successfully find a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong BeLLMan completeness assumption, PDCA only requires concentrability and realizability assumptions for sample-efficient learning.
- Eitan Altman “Constrained Markov Decision Processes: Stochastic Modeling”, 1999
- András Antos, Csaba Szepesvári and Rémi Munos “Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path” In Machine Learning, 2008
- “Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach” In Proceedings of the AAAI Conference on Artificial Intelligence, 2022
- Stephen P. Boyd and Lieven Vandenberghe “Convex optimization”, 2004
- “Information-Theoretic Considerations in Batch Reinforcement Learning” In Proceedings of the 36th International Conference on Machine Learning, 2019
- “Offline reinforcement learning under value and density-ratio realizability: The power of gaps” In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, 2022
- “Decision transformer: Reinforcement learning via sequence modeling” In Advances in neural information processing systems, 2021
- Yi Chen, Jing Dong and Zhaoran Wang “A Primal-Dual Approach to Constrained Markov Decision Processes”, 2021
- “Adversarially Trained Actor Critic for Offline Reinforcement Learning” In Proceedings of the 39th International Conference on Machine Learning, 2022
- “Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes” In Advances in Neural Information Processing Systems, 2020
- “An empirical investigation of the challenges of real-world reinforcement learning”, 2020
- “Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation”, 2022
- Sven Gronauer “Bullet-Safety-Gym: A Framework for Constrained Reinforcement Learning”, 2022
- “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates” In 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017
- “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor” In Proceedings of the 35th International Conference on Machine Learning, 2018
- “Approximately Optimal Approximate Reinforcement Learning” In Proceedings of the Nineteenth International Conference on Machine Learning, 2002
- Sham M Kakade “A Natural Policy Gradient” In Advances in Neural Information Processing Systems, 2001
- “A Workflow for Offline Model-Free Robotic Reinforcement Learning”, 2021
- Hoang Le, Cameron Voloshin and Yisong Yue “Batch Policy Learning under Constraints” In Proceedings of the 36th International Conference on Machine Learning, 2019
- “OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation” In Proceedings of the 38th International Conference on Machine Learning, 2021
- “COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation”, 2022
- “Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”, 2020
- “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection” In The International Journal of Robotics Research, 2018
- “Constrained decision transformer for offline safe reinforcement learning” In arXiv preprint arXiv:2302.07351, 2023
- “Datasets and Benchmarks for Offline Safe Reinforcement Learning” In arXiv preprint arXiv:2306.09303, 2023
- Rémi Munos “Error bounds for approximate policy iteration” In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 2003
- Rémi Munos “Error bounds for approximate value iteration” In Proceedings of the 20th national conference on Artificial intelligence - Volume 2, 2005
- “Finite-Time Bounds for Fitted Value Iteration” In Journal of Machine Learning Research, 2008
- “Revisiting the linear-programming framework for offline rl with general function approximation” In International Conference on Machine Learning, 2023 PMLR
- “Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings” In Proceedings of the 6th Machine Learning for Healthcare Conference, 2021
- “Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems” In IEEE Transactions on Smart Grid, 2020
- “Bellman-consistent pessimism for offline reinforcement learning” In Advances in neural information processing systems, 2021
- “Batch Value-function Approximation with Only Realizability” In Proceedings of the 38th International Conference on Machine Learning, 2021
- “Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison” In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), 2020
- Haoran Xu, Xianyuan Zhan and Xiangyu Zhu “Constraints penalized q-learning for safe offline reinforcement learning” In Proceedings of the AAAI Conference on Artificial Intelligence, 2022
- Andrea Zanette “When is Realizability Sufficient for Off-Policy Reinforcement Learning?”, 2022
- Andrea Zanette and Martin J. Wainwright “Bellman Residual Orthogonalization for Offline Reinforcement Learning”, 2022
- “Offline Reinforcement Learning with Realizability and Single-policy Concentrability” In Proceedings of Thirty Fifth Conference on Learning Theory, 2022
- Hanlin Zhu, Paria Rashidinejad and Jiantao Jiao “Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning”, 2023