On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness (2210.10464v2)
Abstract: Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment. This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful? When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal. Furthermore, when the agent is allowed to interact with the target environment, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, in the non-asymptotic regime, we design an efficient algorithm and prove a distribution-based regret bound in the target environment that is independent of the state-action space.
- Flambe: Structural complexity and representation learning of low rank mdps. Advances in neural information processing systems, 33:20095–20107, 2020.
- Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
- The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
- Minimax regret bounds for reinforcement learning. In International Conference on Machine Learning, pp. 263–272. PMLR, 2017.
- Reinforcement learning of pomdps using spectral methods. In Conference on Learning Theory, pp. 193–256. PMLR, 2016.
- Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002.
- Sample complexity of multi-task reinforcement learning. arXiv preprint arXiv:1309.6821, 2013.
- Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 661–670, 2017.
- Near-optimal reward-free exploration for linear mixture mdps with plug-in solver. arXiv preprint arXiv:2110.03244, 2021.
- Unifying pac and regret: Uniform pac bounds for episodic reinforcement learning. Advances in Neural Information Processing Systems, 30, 2017.
- Provably efficient rl with rich observations via latent state decoding. In International Conference on Machine Learning, pp. 1665–1674. PMLR, 2019.
- Generalization and regularization in dqn. arXiv preprint arXiv:1810.00123, 2018.
- Why generalization in rl is difficult: Epistemic pomdps and implicit partial observability. Advances in Neural Information Processing Systems, 34, 2021.
- A pac rl algorithm for episodic pomdps. In Artificial Intelligence and Statistics, pp. 510–518. PMLR, 2016.
- Near-optimal representation learning for linear bandits and linear rl. In International Conference on Machine Learning, pp. 4349–4358. PMLR, 2021.
- Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12627–12637, 2019.
- Is q-learning provably efficient? Advances in neural information processing systems, 31, 2018.
- Sample-efficient reinforcement learning of undercomplete pomdps. Advances in Neural Information Processing Systems, 33:18530–18539, 2020a.
- Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory, pp. 2137–2143. PMLR, 2020b.
- Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms. Advances in neural information processing systems, 34:13406–13418, 2021.
- Generalization in deep learning. arXiv preprint arXiv:1710.05468, 2017.
- A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794, 2021.
- Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
- Reinforcement learning in robotics: Applications and real-world challenges. Robotics, 2(3):122–148, 2013.
- Reinforcement learning in reward-mixing mdps. Advances in Neural Information Processing Systems, 34, 2021a.
- Rl for latent mdps: Regret guarantees and a lower bound. Advances in Neural Information Processing Systems, 34, 2021b.
- Bandit algorithms. Cambridge University Press, 2020.
- Settling the horizon-dependence of sample complexity in reinforcement learning. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pp. 965–976. IEEE, 2022.
- On the power of multitask representation learning in linear mdp. arXiv preprint arXiv:2106.08053, 2021.
- When is generalizable reinforcement learning tractable? Advances in Neural Information Processing Systems, 34, 2021.
- Resource management with deep reinforcement learning. In Proceedings of the 15th ACM workshop on hot topics in networks, pp. 50–56, 2016.
- Explanation-based generalization: A unifying view. Machine learning, 1(1):47–80, 1986.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Foundations of machine learning. MIT press, 2018.
- O’Donoghue, B. Variational bayesian reinforcement learning with regret bounds. Advances in Neural Information Processing Systems, 34, 2021.
- Why is posterior sampling better than optimism for reinforcement learning? In International conference on machine learning, pp. 2701–2710. PMLR, 2017.
- (more) efficient reinforcement learning via posterior sampling. Advances in Neural Information Processing Systems, 26, 2013.
- Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282, 2018.
- Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA), pp. 3803–3810. IEEE, 2018.
- Epopt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283, 2016.
- Nearly horizon-free offline reinforcement learning. Advances in neural information processing systems, 34, 2021.
- Learning by playing solving sparse reward tasks from scratch. In International conference on machine learning, pp. 4344–4353. PMLR, 2018.
- Sim-to-real robot learning from pixels with progressive nets. In Conference on Robot Learning, pp. 262–270. PMLR, 2017.
- Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70–76, 2017.
- Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
- An mdp-based recommender system. Journal of Machine Learning Research, 6(9), 2005.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- Sutton, R. S. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in neural information processing systems, 8, 1995.
- Reinforcement learning: An introduction. MIT press, 2018.
- Sequential transfer in reinforcement learning with a generative model. In International Conference on Machine Learning, pp. 9481–9492. PMLR, 2020.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- On the generalization gap in reparameterizable reinforcement learning. In International Conference on Machine Learning, pp. 6648–6658. PMLR, 2019.
- Reinforcement learning with general value function approximation: Provably efficient approach via bounded eluder dimension. Advances in Neural Information Processing Systems, 33:6123–6135, 2020.
- Inequalities for the l1 deviation of the empirical distribution. Hewlett-Packard Labs, Tech. Rep, 2003.
- Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(1):1–36, 2021.
- Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020.
- Provably efficient multi-task reinforcement learning with model transfer. Advances in Neural Information Processing Systems, 34, 2021.
- Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon. In Conference on Learning Theory, pp. 4528–4531. PMLR, 2021.
- Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference, pp. 167–176, 2018.