Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning (2306.07818v2)

Published 13 Jun 2023 in cs.LG and stat.ML

Abstract: Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA can successfully find a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong BeLLMan completeness assumption, PDCA only requires concentrability and realizability assumptions for sample-efficient learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Eitan Altman “Constrained Markov Decision Processes: Stochastic Modeling”, 1999
  2. András Antos, Csaba Szepesvári and Rémi Munos “Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path” In Machine Learning, 2008
  3. “Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach” In Proceedings of the AAAI Conference on Artificial Intelligence, 2022
  4. Stephen P. Boyd and Lieven Vandenberghe “Convex optimization”, 2004
  5. “Information-Theoretic Considerations in Batch Reinforcement Learning” In Proceedings of the 36th International Conference on Machine Learning, 2019
  6. “Offline reinforcement learning under value and density-ratio realizability: The power of gaps” In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, 2022
  7. “Decision transformer: Reinforcement learning via sequence modeling” In Advances in neural information processing systems, 2021
  8. Yi Chen, Jing Dong and Zhaoran Wang “A Primal-Dual Approach to Constrained Markov Decision Processes”, 2021
  9. “Adversarially Trained Actor Critic for Offline Reinforcement Learning” In Proceedings of the 39th International Conference on Machine Learning, 2022
  10. “Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes” In Advances in Neural Information Processing Systems, 2020
  11. “An empirical investigation of the challenges of real-world reinforcement learning”, 2020
  12. “Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation”, 2022
  13. Sven Gronauer “Bullet-Safety-Gym: A Framework for Constrained Reinforcement Learning”, 2022
  14. “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates” In 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017
  15. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor” In Proceedings of the 35th International Conference on Machine Learning, 2018
  16. “Approximately Optimal Approximate Reinforcement Learning” In Proceedings of the Nineteenth International Conference on Machine Learning, 2002
  17. Sham M Kakade “A Natural Policy Gradient” In Advances in Neural Information Processing Systems, 2001
  18. “A Workflow for Offline Model-Free Robotic Reinforcement Learning”, 2021
  19. Hoang Le, Cameron Voloshin and Yisong Yue “Batch Policy Learning under Constraints” In Proceedings of the 36th International Conference on Machine Learning, 2019
  20. “OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation” In Proceedings of the 38th International Conference on Machine Learning, 2021
  21. “COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation”, 2022
  22. “Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”, 2020
  23. “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection” In The International Journal of Robotics Research, 2018
  24. “Constrained decision transformer for offline safe reinforcement learning” In arXiv preprint arXiv:2302.07351, 2023
  25. “Datasets and Benchmarks for Offline Safe Reinforcement Learning” In arXiv preprint arXiv:2306.09303, 2023
  26. Rémi Munos “Error bounds for approximate policy iteration” In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 2003
  27. Rémi Munos “Error bounds for approximate value iteration” In Proceedings of the 20th national conference on Artificial intelligence - Volume 2, 2005
  28. “Finite-Time Bounds for Fitted Value Iteration” In Journal of Machine Learning Research, 2008
  29. “Revisiting the linear-programming framework for offline rl with general function approximation” In International Conference on Machine Learning, 2023 PMLR
  30. “Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings” In Proceedings of the 6th Machine Learning for Healthcare Conference, 2021
  31. “Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems” In IEEE Transactions on Smart Grid, 2020
  32. “Bellman-consistent pessimism for offline reinforcement learning” In Advances in neural information processing systems, 2021
  33. “Batch Value-function Approximation with Only Realizability” In Proceedings of the 38th International Conference on Machine Learning, 2021
  34. “Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison” In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), 2020
  35. Haoran Xu, Xianyuan Zhan and Xiangyu Zhu “Constraints penalized q-learning for safe offline reinforcement learning” In Proceedings of the AAAI Conference on Artificial Intelligence, 2022
  36. Andrea Zanette “When is Realizability Sufficient for Off-Policy Reinforcement Learning?”, 2022
  37. Andrea Zanette and Martin J. Wainwright “Bellman Residual Orthogonalization for Offline Reinforcement Learning”, 2022
  38. “Offline Reinforcement Learning with Realizability and Single-policy Concentrability” In Proceedings of Thirty Fifth Conference on Learning Theory, 2022
  39. Hanlin Zhu, Paria Rashidinejad and Jiantao Jiao “Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning”, 2023
Citations (4)

Summary

We haven't generated a summary for this paper yet.