Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning (2404.06188v1)

Published 9 Apr 2024 in cs.LG and cs.AI

Abstract: Offline Reinforcement Learning (RL) faces distributional shift and unreliable value estimation, especially for out-of-distribution (OOD) actions. To address this, existing uncertainty-based methods penalize the value function with uncertainty quantification and demand numerous ensemble networks, posing computational challenges and suboptimal outcomes. In this paper, we introduce a novel strategy employing diverse randomized value functions to estimate the posterior distribution of $Q$-values. It provides robust uncertainty quantification and estimates lower confidence bounds (LCB) of $Q$-values. By applying moderate value penalties for OOD actions, our method fosters a provably pessimistic approach. We also emphasize on diversity within randomized value functions and enhance efficiency by introducing a diversity regularization method, reducing the requisite number of networks. These modules lead to reliable value estimation and efficient policy learning from offline data. Theoretical analysis shows that our method recovers the provably efficient LCB-penalty under linear MDP assumptions. Extensive empirical results also demonstrate that our proposed method significantly outperforms baseline methods in terms of performance and parametric efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  2. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research, 37(4-5):421–436, 2018.
  3. Mt-opt: Continuous multi-task robotic reinforcement learning at scale. arXiv preprint arXiv:2104.08212, 2021.
  4. Decision transformer: Reinforcement learning via sequence modeling. Advances in Neural Information Processing Systems, 34:15084–15097, 2021.
  5. Offline reinforcement learning as one big sequence modeling problem. In NeurIPS, pages 1273–1286, 2021.
  6. Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, pages 2052–2062. PMLR, 2019.
  7. Stabilizing off-policy q-learning via bootstrapping error reduction. In Advances in Neural Information Processing Systems, volume 32, pages 11784–11794, 2019.
  8. Conservative q-learning for offline reinforcement learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1179–1191, 2020.
  9. Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning. In International Conference on Learning Representations, 2022.
  10. Uncertainty-based offline reinforcement learning with diversified q-ensemble. In Advances in Neural Information Processing Systems, 2021.
  11. Repulsive deep ensembles are bayesian. Advances in Neural Information Processing Systems, 34:3451–3465, 2021.
  12. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  13. Human-level control through deep reinforcement learning. Nature, 518:529–533, 2015.
  14. Is pessimism provably efficient for offline rl? In International Conference on Machine Learning, pages 5084–5096. PMLR, 2021.
  15. Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory, pages 2137–2143. PMLR, 2020.
  16. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
  17. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.
  18. Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
  19. A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems, 2021.
  20. Offline reinforcement learning with implicit q-learning. In ICLR. OpenReview.net, 2022.
  21. A policy-guided imitation approach for offline reinforcement learning. ArXiv, abs/2210.08323, 2022.
  22. Mopo: Model-based offline policy optimization. In Advances in Neural Information Processing Systems, volume 33, pages 14129–14142, 2020.
  23. Morel: Model-based offline reinforcement learning. In Advances in Neural Information Processing Systems, volume 33, pages 21810–21823, 2020.
  24. Deployment-efficient reinforcement learning via model-based offline optimization. In ICLR. OpenReview.net, 2021.
  25. Combo: Conservative offline model-based policy optimization. In Advances in Neural Information Processing Systems, 2021.
  26. Bootstrapped transformer for offline reinforcement learning. In Advances in Neural Information Processing Systems, 2022.
  27. Adversarially trained actor critic for offline reinforcement learning. In International Conference on Machine Learning, 2022.
  28. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
  29. Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning. In International Conference on Machine Learning, pages 6131–6141. PMLR, 2021.
  30. Deep exploration via randomized value functions. Journal of Machine Learning Research, 20(124):1–62, 2019.
  31. An optimistic perspective on offline reinforcement learning. In ICML, volume 119 of Proceedings of Machine Learning Research, pages 104–114. PMLR, 2020.
  32. Hyperparameter ensembles for robustness and uncertainty quantification. Advances in Neural Information Processing Systems, 33:6514–6527, 2020.
  33. Depth uncertainty in neural networks. In NeurIPS, 2020.
  34. Early exit ensembles for uncertainty quantification. In ML4H@NeurIPS, volume 158 of Proceedings of Machine Learning Research, pages 181–195. PMLR, 2021.
  35. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer, 2011.
  36. Weight uncertainty in neural network. In International conference on machine learning, pages 1613–1622. PMLR, 2015.
  37. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, pages 1050–1059. PMLR, 2016.
  38. Dropout q-functions for doubly efficient reinforcement learning. In International Conference on Learning Representations, 2022.
  39. Uncertainty weighted actor-critic for offline reinforcement learning. In International Conference on Machine Learning, volume 139, pages 11319–11328, 2021.
  40. Noisy networks for exploration. In International Conference on Learning Representations, 2018.
  41. Hyperdqn: A randomized exploration method for deep reinforcement learning. In International Conference on Learning Representations, 2021.
  42. Randomized prior functions for deep reinforcement learning. Advances in Neural Information Processing Systems, 31, 2018.
  43. Stein variational gradient descent: A general purpose bayesian inference algorithm. Advances in Neural Information Processing Systems, 29, 2016.
  44. Uncertainty in neural networks: Approximately bayesian ensembling. In International conference on artificial intelligence and statistics, pages 234–244. PMLR, 2020.
  45. Agree to disagree: Diversity through disagreement for better transferability. ArXiv, abs/2202.04414, 2022.
  46. Bellman-consistent pessimism for offline reinforcement learning. In Advances in Neural Information Processing Systems, 2021.
  47. Rvs: What is essential for offline RL via supervised learning? In ICLR. OpenReview.net, 2022.
  48. Why so pessimistic? estimating uncertainties for offline rl through ensemb les, and why their independence matters. In Advances in Neural Information Processing Systems, 2022.
  49. Deep reinforcement learning at the edge of the statistical precipice. In Advances in Neural Information Processing Systems, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xudong Yu (17 papers)
  2. Chenjia Bai (47 papers)
  3. Hongyi Guo (14 papers)
  4. Changhong Wang (15 papers)
  5. Zhen Wang (571 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets