Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs (2402.04493v2)

Published 7 Feb 2024 in stat.ML and cs.LG

Abstract: We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $O(\epsilon{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(\epsilon{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $O(\epsilon{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Yasin Abbasi-Yadkori, Dávid Pál and Csaba Szepesvári “Improved algorithms for linear stochastic bandits” In Advances in neural information processing systems, 2011
  2. Eitan Altman “Constrained Markov decision processes”, 2021
  3. András Antos, Csaba Szepesvári and Rémi Munos “Fitted Q-iteration in continuous action-space MDPs” In Advances in neural information processing systems, 2007
  4. “Achieving zero constraint violation for concave utility constrained reinforcement learning via primal-dual approach” In Journal of Artificial Intelligence Research, 2023
  5. “Safe learning in robotics: From learning-based control to safe reinforcement learning” In Annual Review of Control, Robotics, and Autonomous Systems, 2022
  6. “Information-theoretic considerations in batch reinforcement learning” In International Conference on Machine Learning, 2019 PMLR
  7. “Offline reinforcement learning under value and density-ratio realizability: the power of gaps” In Uncertainty in Artificial Intelligence, 2022 PMLR
  8. Yi Chen, Jing Dong and Zhaoran Wang “A primal-dual approach to constrained markov decision processes” In arXiv preprint arXiv:2101.10895, 2021
  9. “Natural policy gradient primal-dual method for constrained markov decision processes” In Advances in Neural Information Processing Systems, 2020
  10. “Offline Primal-Dual Reinforcement Learning for Linear MDPs” In arXiv preprint arXiv:2305.12944, 2023
  11. Elad Hazan “Introduction to online convex optimization” In Foundations and Trends® in Optimization, 2016
  12. Kihyuk Hong, Yuhang Li and Ambuj Tewari “A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning” In arXiv preprint arXiv:2306.07818, 2023
  13. “Provably efficient reinforcement learning with linear function approximation” In Conference on Learning Theory, 2020 PMLR
  14. Ying Jin, Zhuoran Yang and Zhaoran Wang “Is pessimism provably efficient for offline rl?” In International Conference on Machine Learning, 2021 PMLR
  15. “A workflow for offline model-free robotic reinforcement learning” In arXiv preprint arXiv:2109.10813, 2021
  16. “Bandit algorithms”, 2020
  17. Hoang Le, Cameron Voloshin and Yisong Yue “Batch policy learning under constraints” In International Conference on Machine Learning, 2019 PMLR
  18. “Offline reinforcement learning: Tutorial, review, and perspectives on open problems” In arXiv preprint arXiv:2005.01643, 2020
  19. “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection” In The International journal of robotics research, 2018
  20. Rémi Munos “Error bounds for approximate policy iteration” In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 2003
  21. Rémi Munos “Error bounds for approximate value iteration” In Proceedings of the National Conference on Artificial Intelligence, 2005 Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999
  22. “Finite-Time Bounds for Fitted Value Iteration.” In Journal of Machine Learning Research, 2008
  23. Martin L Puterman “Markov decision processes: discrete stochastic dynamic programming”, 2014
  24. “Model selection for offline reinforcement learning: Practical considerations for healthcare settings” In Machine Learning for Healthcare Conference, 2021 PMLR
  25. “Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage” In International Conference on Learning Representations, 2022
  26. “First-order regret in reinforcement learning with linear function approximation: A robust estimation approach” In International Conference on Machine Learning, 2022 PMLR
  27. Martin J Wainwright “High-dimensional statistics: A non-asymptotic viewpoint”, 2019
  28. “Safe off-policy deep reinforcement learning algorithm for volt-var control in power distribution systems” In IEEE Transactions on Smart Grid, 2019
  29. “Learning infinite-horizon average-reward mdps with linear function approximation” In International Conference on Artificial Intelligence and Statistics, 2021 PMLR
  30. “Bellman-consistent pessimism for offline reinforcement learning” In Advances in neural information processing systems, 2021
  31. “Q* approximation schemes for batch reinforcement learning: A theoretical comparison” In Conference on Uncertainty in Artificial Intelligence, 2020 PMLR
  32. Andrea Zanette “When is realizability sufficient for off-policy reinforcement learning?” In International Conference on Machine Learning, 2023 PMLR
  33. Andrea Zanette, Martin J Wainwright and Emma Brunskill “Provable benefits of actor-critic methods for offline reinforcement learning” In Advances in neural information processing systems, 2021
  34. “Offline reinforcement learning with realizability and single-policy concentrability” In Conference on Learning Theory, 2022 PMLR
  35. Hanlin Zhu, Paria Rashidinejad and Jiantao Jiao “Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning” In arXiv preprint arXiv:2301.12714, 2023
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com