SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning (2405.15920v2)
Abstract: This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF & GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN & GPI, aligning with our theoretical findings.
- Feature purification: How adversarial training performs robust deep learning. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pp. 977–988. IEEE, 2022.
- The option-critic architecture. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Learning two layer rectified neural networks in polynomial time. In Conference on Learning Theory, pp. 195–268. PMLR, 2019.
- Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30, 2017.
- Transfer in deep reinforcement learning using successor features and generalised policy improvement. In International Conference on Machine Learning, pp. 501–510. PMLR, 2018.
- Neuro-dynamic programming. Athena Scientific, 1996.
- A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pp. 1691–1692. PMLR, 2018.
- Bhatia, R. Matrix analysis, volume 169. Springer Science & Business Media, 2013.
- An optimization and generalization analysis for max-pooling networks. In Uncertainty in Artificial Intelligence, pp. 1650–1660. PMLR, 2021.
- Neural temporal-difference learning converges to global optima. Advances in Neural Information Processing Systems, 32, 2019.
- Patch-level routing in mixture-of-experts is provably sample-efficient for convolutional neural networks. In International Conference on Machine Learning, pp. 6074–6114. PMLR, 2023.
- Reinforcement learning for intelligent healthcare applications: A survey. Artificial Intelligence in Medicine, 109:101964, 2020.
- Dayan, P. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4):613–624, 1993.
- Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature. Advances in Neural Information Processing Systems, 34:26168–26182, 2021.
- Gradient descent provably optimizes over-parameterized neural networks. In International Conference on Learning Representations, 2018.
- Gradient descent provably optimizes over-parameterized neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=S1eK3i09YQ.
- Agnostic q𝑞qitalic_q-learning with function approximation in deterministic systems: Near-optimal bounds on approximation error and sample complexity. Advances in Neural Information Processing Systems, 33:22327–22337, 2020.
- A theoretical analysis of deep q-learning. In Learning for Dynamics and Control, pp. 486–489. PMLR, 2020.
- Learning one-hidden-layer neural networks with landscape design. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkwHObbRZ.
- Neural tangent kernel: Convergence and generalization in neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018.
- Sample complexity of nonparametric off-policy evaluation on low-dimensional manifolds using deep networks. arXiv preprint arXiv:2206.02887, 2022.
- Contextual decision processes with low bellman rank are pac-learnable. In International Conference on Machine Learning, pp. 1704–1713. PMLR, 2017.
- Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning, pp. 651–673. PMLR, 2018.
- Local signal adaptivity: Provable feature learning in neural networks beyond kernels. Advances in Neural Information Processing Systems, 34:24883–24897, 2021.
- Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems, 29, 2016a.
- Deep successor reinforcement learning. arXiv preprint arXiv:1606.02396, 2016b.
- Lazaric, A. Transfer in reinforcement learning: a framework and a survey. In Reinforcement Learning: State-of-the-Art, pp. 143–173. Springer, 2012.
- Deep neural networks as gaussian processes. In International Conference on Learning Representations, 2018.
- A theoretical understanding of vision transformers: Learning, generalization, and sample complexity. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=jClGv3Qjhb.
- Learning overparameterized neural networks via stochastic gradient descent on structured data. In Advances in Neural Information Processing Systems, pp. 8157–8166, 2018.
- Understanding deep neural function approximation in reinforcement learning via ϵitalic-ϵ\epsilonitalic_ϵ-greedy exploration. arXiv preprint arXiv:2209.07376, 2022.
- Mitrophanov, A. Y. Sensitivity and convergence of uniformly ergodic markov chains. Journal of Applied Probability, 42(4):1003–1014, 2005.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- On sample complexity of offline reinforcement learning with deep reLU networks in besov spaces. Transactions on Machine Learning Research, 2022. URL https://openreview.net/forum?id=LdEm0umNcv.
- Eluder dimension and the sample complexity of optimistic exploration. Advances in Neural Information Processing Systems, 26, 2013.
- Spurious local minima are common in two-layer relu neural networks. In International Conference on Machine Learning, pp. 4430–4438, 2018.
- Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1:187–210, 2018.
- Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
- A theoretical analysis on feature learning in neural networks: Emergence from inputs and advantage over fixed features. In International Conference on Learning Representations, 2022.
- Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
- Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. IEEE Transactions on Information Theory, 65(2):742–769, 2018.
- Reinforcement learning: An introduction. MIT press, 2018.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
- Suzuki, T. Adaptivity of deep reLU network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=H1ebTsActm.
- Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 2009.
- What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations, 2018.
- Tropp, J. A. User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics, 12(4):389–434, 2012.
- Vershynin, R. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027, 2010.
- Q-learning. Machine learning, 8(3):279–292, 1992.
- Toward understanding the feature learning process of self-supervised contrastive learning. In International Conference on Machine Learning, pp. 11112–11122. PMLR, 2021.
- A finite-time analysis of q-learning with neural network function approximation. In International Conference on Machine Learning, pp. 10555–10565. PMLR, 2020.
- On function approximation in reinforcement learning: optimism in the face of large state spaces. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 13903–13916, 2020.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pp. 1094–1100. PMLR, 2020.
- Deep reinforcement learning with successor features for navigation across similar environments. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2371–2378. IEEE, 2017.
- Improved linear convergence of training cnns with generalizability guarantees: A one-hidden-layer case. IEEE Transactions on Neural Networks and Learning Systems, 2020.
- How unlabeled data improve generalization in self-training? a one-hidden-layer theoretical analysis. In International Conference on Learning Representations, 2022.
- On the convergence and sample complexity analysis of deep q-networks with epsilon-greedy exploration. Advances in Neural Information Processing Systems, 36, 2023a.
- Joint edge-model sparse learning is provably efficient for graph neural networks. The Eleventh International Conference on Learning Representations, 2023b.
- Recovery guarantees for one-hidden-layer neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 4140–4149. JMLR. org, https://arxiv.org/abs/1706.03175, 2017.
- Transfer learning in deep reinforcement learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Finite-sample analysis for sarsa with linear function approximation. Advances in Neural Information Processing Systems, 32, 2019.
- Shuai Zhang (319 papers)
- Heshan Devaka Fernando (1 paper)
- Miao Liu (98 papers)
- Keerthiram Murugesan (38 papers)
- Songtao Lu (60 papers)
- Pin-Yu Chen (311 papers)
- Tianyi Chen (139 papers)
- Meng Wang (1063 papers)