Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 67 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources (2306.08364v1)

Published 14 Jun 2023 in stat.ML, cs.IT, cs.LG, and math.IT

Abstract: Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task. In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself. An information-theoretic lower bound is derived, which reveals a necessary requirement on the number of involved sources in addition to that on the number of data samples. Then, a novel HetPEVI algorithm is proposed, which simultaneously considers the sample uncertainties from a finite number of data samples per data source and the source uncertainties due to a finite number of available data sources. Theoretical analyses demonstrate that HetPEVI can solve the target task as long as the data sources collectively provide a good data coverage. Moreover, HetPEVI is demonstrated to be optimal up to a polynomial factor of the horizon length. Finally, the study is extended to offline Markov games and offline robust RL, which demonstrates the generality of the proposed designs and theoretical analyses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
  2. Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep, 2019.
  3. Provable benefits of representational transfer in reinforcement learning. arXiv preprint arXiv:2205.14571, 2022.
  4. When is offline two-player zero-sum markov game solvable? arXiv preprint arXiv:2201.03522, 2022a.
  5. Provably efficient offline multi-agent reinforcement learning via strategy-wise bonus. arXiv preprint arXiv:2206.00159, 2022b.
  6. Offline meta reinforcement learning–identifiability challenges and effective data collection strategies. Advances in Neural Information Processing Systems, 34, 2021.
  7. Provably efficient cooperative multi-agent reinforcement learning with function approximation. arXiv preprint arXiv:2103.04972, 2021.
  8. Kullback-leibler divergence constrained distributionally robust optimization. Available at Optimization Online, pp.  1695–1724, 2013.
  9. Iyengar, G. N. Robust dynamic programming. Mathematics of Operations Research, 30(2):257–280, 2005.
  10. Human-centric dialog training via offline reinforcement learning. In EMNLP (1), 2020.
  11. Calibrated inference: statistical inference that accounts for both sampling uncertainty and distributional uncertainty. arXiv preprint arXiv:2202.11886, 2022.
  12. Is q-learning provably efficient? Advances in neural information processing systems, 31, 2018.
  13. Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory, pp.  2137–2143. PMLR, 2020.
  14. Federated reinforcement learning with environment heterogeneity. In International Conference on Artificial Intelligence and Statistics, pp.  18–37. PMLR, 2022.
  15. Is pessimism provably efficient for offline rl? In International Conference on Machine Learning, pp. 5084–5096. PMLR, 2021.
  16. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  17. Rl for latent mdps: Regret guarantees and a lower bound. Advances in Neural Information Processing Systems, 34, 2021.
  18. Batch reinforcement learning. In Reinforcement learning, pp.  45–73. Springer, 2012.
  19. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  20. Settling the sample complexity of model-based offline reinforcement learning. arXiv preprint arXiv:2204.05275, 2022.
  21. Focal: Efficient fully-offline meta-reinforcement learning via distance metric learning and behavior regularization. arXiv preprint arXiv:2010.01112, 2020.
  22. Model-based offline meta-reinforcement learning with regularization. arXiv preprint arXiv:2202.02929, 2022.
  23. On the power of multitask representation learning in linear mdp. arXiv preprint arXiv:2106.08053, 2021.
  24. Distributionally robust offline reinforcement learning with linear function approximation. arXiv preprint arXiv:2209.06620, 2022.
  25. On the sub-gaussianity of the beta and dirichlet distributions. Electronic Communications in Probability, 22:1–14, 2017.
  26. Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning, pp. 7780–7791. PMLR, 2021.
  27. Sample complexity of robust reinforcement learning with a generative model. In International Conference on Artificial Intelligence and Statistics, pp.  9582–9602. PMLR, 2022.
  28. Robust reinforcement learning using offline data. arXiv preprint arXiv:2208.05129, 2022.
  29. Bridging offline reinforcement learning and imitation learning: A tale of pessimism. Advances in Neural Information Processing Systems, 34, 2021.
  30. Federated multi-armed bandits. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), 2021.
  31. Distributionally robust model-based offline reinforcement learning with near-optimal sample complexity. arXiv preprint arXiv:2208.05767, 2022.
  32. Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity. arXiv preprint arXiv:2202.13890, 2022.
  33. Deepaveragers: offline reinforcement learning by solving derived non-parametric mdps. arXiv preprint arXiv:2010.08891, 2020.
  34. Reinforcement learning: An introduction. MIT press, 2018.
  35. Model selection for offline reinforcement learning: Practical considerations for healthcare settings. In Machine Learning for Healthcare Conference, pp.  2–35. PMLR, 2021.
  36. Tsybakov, A. B. Introduction to Nonparametric Estimation. Springer, 2009.
  37. Pessimistic model-based offline reinforcement learning under partial coverage. arXiv preprint arXiv:2107.06226, 2021.
  38. Bellman-consistent pessimism for offline reinforcement learning. Advances in neural information processing systems, 34, 2021a.
  39. Policy finetuning: Bridging sample-efficient offline and online reinforcement learning. Advances in neural information processing systems, 34, 2021b.
  40. Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game. arXiv preprint arXiv:2205.15512, 2022.
  41. Model-based reinforcement learning is minimax-optimal for offline zero-sum markov games. arXiv preprint arXiv:2206.04044, 2022.
  42. Impact of representation learning in linear bandits. In International Conference on Learning Representations, 2020.
  43. Nearly minimax algorithms for linear bandits with shared representation. arXiv preprint arXiv:2203.15664, 2022.
  44. Towards theoretical understandings of robust markov decision processes: Sample complexity and asymptotics. arXiv preprint arXiv:2105.03863, 2021.
  45. Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism. arXiv preprint arXiv:2203.05804, 2022.
  46. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 33:14129–14142, 2020.
  47. Provable benefits of actor-critic methods for offline reinforcement learning. Advances in neural information processing systems, 34, 2021.
  48. Provably efficient multi-task reinforcement learning with model transfer. Advances in Neural Information Processing Systems, 34, 2021.
  49. Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets. In International Conference on Machine Learning, pp. 27117–27142. PMLR, 2022.
  50. Federated offline reinforcement learning. arXiv preprint arXiv:2206.05581, 2022a.
  51. Horizon-free reinforcement learning for latent markov decision processes. arXiv preprint arXiv:2210.11604, 2022b.
  52. Finite-sample regret bound for distributionally robust offline tabular reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pp.  3331–3339. PMLR, 2021.
  53. Random effect bandits. In International Conference on Artificial Intelligence and Statistics, pp.  3091–3107. PMLR, 2022.
  54. Federated deep reinforcement learning. arXiv preprint arXiv:1901.08277, 2019.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube