Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning (2405.15920v2)

Published 24 May 2024 in cs.LG and stat.ML

Abstract: This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF & GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN & GPI, aligning with our theoretical findings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Feature purification: How adversarial training performs robust deep learning. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pp.  977–988. IEEE, 2022.
  2. The option-critic architecture. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
  3. Learning two layer rectified neural networks in polynomial time. In Conference on Learning Theory, pp.  195–268. PMLR, 2019.
  4. Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30, 2017.
  5. Transfer in deep reinforcement learning using successor features and generalised policy improvement. In International Conference on Machine Learning, pp. 501–510. PMLR, 2018.
  6. Neuro-dynamic programming. Athena Scientific, 1996.
  7. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pp.  1691–1692. PMLR, 2018.
  8. Bhatia, R. Matrix analysis, volume 169. Springer Science & Business Media, 2013.
  9. An optimization and generalization analysis for max-pooling networks. In Uncertainty in Artificial Intelligence, pp.  1650–1660. PMLR, 2021.
  10. Neural temporal-difference learning converges to global optima. Advances in Neural Information Processing Systems, 32, 2019.
  11. Patch-level routing in mixture-of-experts is provably sample-efficient for convolutional neural networks. In International Conference on Machine Learning, pp. 6074–6114. PMLR, 2023.
  12. Reinforcement learning for intelligent healthcare applications: A survey. Artificial Intelligence in Medicine, 109:101964, 2020.
  13. Dayan, P. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4):613–624, 1993.
  14. Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature. Advances in Neural Information Processing Systems, 34:26168–26182, 2021.
  15. Gradient descent provably optimizes over-parameterized neural networks. In International Conference on Learning Representations, 2018.
  16. Gradient descent provably optimizes over-parameterized neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=S1eK3i09YQ.
  17. Agnostic q𝑞qitalic_q-learning with function approximation in deterministic systems: Near-optimal bounds on approximation error and sample complexity. Advances in Neural Information Processing Systems, 33:22327–22337, 2020.
  18. A theoretical analysis of deep q-learning. In Learning for Dynamics and Control, pp.  486–489. PMLR, 2020.
  19. Learning one-hidden-layer neural networks with landscape design. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkwHObbRZ.
  20. Neural tangent kernel: Convergence and generalization in neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018.
  21. Sample complexity of nonparametric off-policy evaluation on low-dimensional manifolds using deep networks. arXiv preprint arXiv:2206.02887, 2022.
  22. Contextual decision processes with low bellman rank are pac-learnable. In International Conference on Machine Learning, pp. 1704–1713. PMLR, 2017.
  23. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning, pp.  651–673. PMLR, 2018.
  24. Local signal adaptivity: Provable feature learning in neural networks beyond kernels. Advances in Neural Information Processing Systems, 34:24883–24897, 2021.
  25. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems, 29, 2016a.
  26. Deep successor reinforcement learning. arXiv preprint arXiv:1606.02396, 2016b.
  27. Lazaric, A. Transfer in reinforcement learning: a framework and a survey. In Reinforcement Learning: State-of-the-Art, pp.  143–173. Springer, 2012.
  28. Deep neural networks as gaussian processes. In International Conference on Learning Representations, 2018.
  29. A theoretical understanding of vision transformers: Learning, generalization, and sample complexity. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=jClGv3Qjhb.
  30. Learning overparameterized neural networks via stochastic gradient descent on structured data. In Advances in Neural Information Processing Systems, pp. 8157–8166, 2018.
  31. Understanding deep neural function approximation in reinforcement learning via ϵitalic-ϵ\epsilonitalic_ϵ-greedy exploration. arXiv preprint arXiv:2209.07376, 2022.
  32. Mitrophanov, A. Y. Sensitivity and convergence of uniformly ergodic markov chains. Journal of Applied Probability, 42(4):1003–1014, 2005.
  33. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  34. On sample complexity of offline reinforcement learning with deep reLU networks in besov spaces. Transactions on Machine Learning Research, 2022. URL https://openreview.net/forum?id=LdEm0umNcv.
  35. Eluder dimension and the sample complexity of optimistic exploration. Advances in Neural Information Processing Systems, 26, 2013.
  36. Spurious local minima are common in two-layer relu neural networks. In International Conference on Machine Learning, pp. 4430–4438, 2018.
  37. Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1:187–210, 2018.
  38. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
  39. A theoretical analysis on feature learning in neural networks: Emergence from inputs and advantage over fixed features. In International Conference on Learning Representations, 2022.
  40. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
  41. Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. IEEE Transactions on Information Theory, 65(2):742–769, 2018.
  42. Reinforcement learning: An introduction. MIT press, 2018.
  43. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  44. Suzuki, T. Adaptivity of deep reLU network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=H1ebTsActm.
  45. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 2009.
  46. What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations, 2018.
  47. Tropp, J. A. User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics, 12(4):389–434, 2012.
  48. Vershynin, R. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027, 2010.
  49. Q-learning. Machine learning, 8(3):279–292, 1992.
  50. Toward understanding the feature learning process of self-supervised contrastive learning. In International Conference on Machine Learning, pp. 11112–11122. PMLR, 2021.
  51. A finite-time analysis of q-learning with neural network function approximation. In International Conference on Machine Learning, pp. 10555–10565. PMLR, 2020.
  52. On function approximation in reinforcement learning: optimism in the face of large state spaces. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp.  13903–13916, 2020.
  53. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pp.  1094–1100. PMLR, 2020.
  54. Deep reinforcement learning with successor features for navigation across similar environments. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.  2371–2378. IEEE, 2017.
  55. Improved linear convergence of training cnns with generalizability guarantees: A one-hidden-layer case. IEEE Transactions on Neural Networks and Learning Systems, 2020.
  56. How unlabeled data improve generalization in self-training? a one-hidden-layer theoretical analysis. In International Conference on Learning Representations, 2022.
  57. On the convergence and sample complexity analysis of deep q-networks with epsilon-greedy exploration. Advances in Neural Information Processing Systems, 36, 2023a.
  58. Joint edge-model sparse learning is provably efficient for graph neural networks. The Eleventh International Conference on Learning Representations, 2023b.
  59. Recovery guarantees for one-hidden-layer neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.  4140–4149. JMLR. org, https://arxiv.org/abs/1706.03175, 2017.
  60. Transfer learning in deep reinforcement learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  61. Finite-sample analysis for sarsa with linear function approximation. Advances in Neural Information Processing Systems, 32, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Shuai Zhang (319 papers)
  2. Heshan Devaka Fernando (1 paper)
  3. Miao Liu (98 papers)
  4. Keerthiram Murugesan (38 papers)
  5. Songtao Lu (60 papers)
  6. Pin-Yu Chen (311 papers)
  7. Tianyi Chen (139 papers)
  8. Meng Wang (1063 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets