Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness (2210.10464v2)

Published 19 Oct 2022 in cs.LG

Abstract: Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment. This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful? When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal. Furthermore, when the agent is allowed to interact with the target environment, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, in the non-asymptotic regime, we design an efficient algorithm and prove a distribution-based regret bound in the target environment that is independent of the state-action space.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Flambe: Structural complexity and representation learning of low rank mdps. Advances in neural information processing systems, 33:20095–20107, 2020.
  2. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
  3. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
  4. Minimax regret bounds for reinforcement learning. In International Conference on Machine Learning, pp. 263–272. PMLR, 2017.
  5. Reinforcement learning of pomdps using spectral methods. In Conference on Learning Theory, pp.  193–256. PMLR, 2016.
  6. Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002.
  7. Sample complexity of multi-task reinforcement learning. arXiv preprint arXiv:1309.6821, 2013.
  8. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp.  661–670, 2017.
  9. Near-optimal reward-free exploration for linear mixture mdps with plug-in solver. arXiv preprint arXiv:2110.03244, 2021.
  10. Unifying pac and regret: Uniform pac bounds for episodic reinforcement learning. Advances in Neural Information Processing Systems, 30, 2017.
  11. Provably efficient rl with rich observations via latent state decoding. In International Conference on Machine Learning, pp. 1665–1674. PMLR, 2019.
  12. Generalization and regularization in dqn. arXiv preprint arXiv:1810.00123, 2018.
  13. Why generalization in rl is difficult: Epistemic pomdps and implicit partial observability. Advances in Neural Information Processing Systems, 34, 2021.
  14. A pac rl algorithm for episodic pomdps. In Artificial Intelligence and Statistics, pp.  510–518. PMLR, 2016.
  15. Near-optimal representation learning for linear bandits and linear rl. In International Conference on Machine Learning, pp. 4349–4358. PMLR, 2021.
  16. Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12627–12637, 2019.
  17. Is q-learning provably efficient? Advances in neural information processing systems, 31, 2018.
  18. Sample-efficient reinforcement learning of undercomplete pomdps. Advances in Neural Information Processing Systems, 33:18530–18539, 2020a.
  19. Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory, pp.  2137–2143. PMLR, 2020b.
  20. Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms. Advances in neural information processing systems, 34:13406–13418, 2021.
  21. Generalization in deep learning. arXiv preprint arXiv:1710.05468, 2017.
  22. A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794, 2021.
  23. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
  24. Reinforcement learning in robotics: Applications and real-world challenges. Robotics, 2(3):122–148, 2013.
  25. Reinforcement learning in reward-mixing mdps. Advances in Neural Information Processing Systems, 34, 2021a.
  26. Rl for latent mdps: Regret guarantees and a lower bound. Advances in Neural Information Processing Systems, 34, 2021b.
  27. Bandit algorithms. Cambridge University Press, 2020.
  28. Settling the horizon-dependence of sample complexity in reinforcement learning. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pp.  965–976. IEEE, 2022.
  29. On the power of multitask representation learning in linear mdp. arXiv preprint arXiv:2106.08053, 2021.
  30. When is generalizable reinforcement learning tractable? Advances in Neural Information Processing Systems, 34, 2021.
  31. Resource management with deep reinforcement learning. In Proceedings of the 15th ACM workshop on hot topics in networks, pp.  50–56, 2016.
  32. Explanation-based generalization: A unifying view. Machine learning, 1(1):47–80, 1986.
  33. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  34. Foundations of machine learning. MIT press, 2018.
  35. O’Donoghue, B. Variational bayesian reinforcement learning with regret bounds. Advances in Neural Information Processing Systems, 34, 2021.
  36. Why is posterior sampling better than optimism for reinforcement learning? In International conference on machine learning, pp. 2701–2710. PMLR, 2017.
  37. (more) efficient reinforcement learning via posterior sampling. Advances in Neural Information Processing Systems, 26, 2013.
  38. Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282, 2018.
  39. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA), pp.  3803–3810. IEEE, 2018.
  40. Epopt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283, 2016.
  41. Nearly horizon-free offline reinforcement learning. Advances in neural information processing systems, 34, 2021.
  42. Learning by playing solving sparse reward tasks from scratch. In International conference on machine learning, pp. 4344–4353. PMLR, 2018.
  43. Sim-to-real robot learning from pixels with progressive nets. In Conference on Robot Learning, pp.  262–270. PMLR, 2017.
  44. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70–76, 2017.
  45. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
  46. An mdp-based recommender system. Journal of Machine Learning Research, 6(9), 2005.
  47. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  48. Sutton, R. S. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in neural information processing systems, 8, 1995.
  49. Reinforcement learning: An introduction. MIT press, 2018.
  50. Sequential transfer in reinforcement learning with a generative model. In International Conference on Machine Learning, pp. 9481–9492. PMLR, 2020.
  51. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
  52. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
  53. On the generalization gap in reparameterizable reinforcement learning. In International Conference on Machine Learning, pp. 6648–6658. PMLR, 2019.
  54. Reinforcement learning with general value function approximation: Provably efficient approach via bounded eluder dimension. Advances in Neural Information Processing Systems, 33:6123–6135, 2020.
  55. Inequalities for the l1 deviation of the empirical distribution. Hewlett-Packard Labs, Tech. Rep, 2003.
  56. Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(1):1–36, 2021.
  57. Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020.
  58. Provably efficient multi-task reinforcement learning with model transfer. Advances in Neural Information Processing Systems, 34, 2021.
  59. Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon. In Conference on Learning Theory, pp.  4528–4531. PMLR, 2021.
  60. Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference, pp. 167–176, 2018.
Citations (3)

Summary

We haven't generated a summary for this paper yet.