Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning (2401.03137v1)

Published 6 Jan 2024 in cs.LG, cs.AI, and stat.ML

Abstract: Alleviating overestimation bias is a critical challenge for deep reinforcement learning to achieve successful performance on more complex tasks or offline datasets containing out-of-distribution data. In order to overcome overestimation bias, ensemble methods for Q-learning have been investigated to exploit the diversity of multiple Q-functions. Since network initialization has been the predominant approach to promote diversity in Q-functions, heuristically designed diversity injection methods have been studied in the literature. However, previous studies have not attempted to approach guaranteed independence over an ensemble from a theoretical perspective. By introducing a novel regularization loss for Q-ensemble independence based on random matrix theory, we propose spiked Wishart Q-ensemble independence regularization (SPQR) for reinforcement learning. Specifically, we modify the intractable hypothesis testing criterion for the Q-ensemble independence into a tractable KL divergence between the spectral distribution of the Q-ensemble and the target Wigner's semicircle distribution. We implement SPQR in several online and offline ensemble Q-learning algorithms. In the experiments, SPQR outperforms the baseline algorithms in both online and offline RL benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Do as i can, not as i say: Grounding language in robotic affordances, 2022.
  2. Uncertainty-based offline reinforcement learning with diversified q-ensemble. In NeurIPS, pages 7436–7447, 2021.
  3. Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In International conference on machine learning, pages 176–185. PMLR, 2017.
  4. Information-theoretic bounds and phase transitions in clustering, sparse pca, and submatrix localization. IEEE Transactions on Information Theory, 64(7):4872–4894, 2018.
  5. Appearance of random matrix theory in deep learning. Physica A: Statistical Mechanics and its Applications, 590:126742, 2022.
  6. Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations, 2021.
  7. Weak detection of signal in the spiked wigner model. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1233–1241. PMLR, 09–15 Jun 2019.
  8. Nonasymptotic guarantees for spiked matrix recovery with generative priors. Advances in Neural Information Processing Systems, 33:15185–15197, 2020.
  9. Momar Dieng. On the distribution of eigenvalues and their spacings: a rapid introduction to random matrix theory.
  10. Freeman J Dyson. The threefold way. algebraic structure of symmetry groups and ensembles in quantum mechanics. Journal of Mathematical Physics, 3(6):1199–1215, 1962.
  11. Ahmed El Alaoui El Abidi. Detection limits and fluctuation results in some spiked random matrix models and pooling of discrete data. University of California, Berkeley, 2018.
  12. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
  13. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  14. Mike Giles. An extended collection of matrix derivative results for forward and reverse mode automatic differentiation. 2008.
  15. Hado Hasselt. Double q-learning. Advances in neural information processing systems, 23, 2010.
  16. Matrix backpropagation for deep networks with structured layers. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2965–2973, 2015. doi: 10.1109/ICCV.2015.339.
  17. Iain M Johnstone. On the distribution of the largest eigenvalue in principal components analysis. The Annals of statistics, 29(2):295–327, 2001.
  18. Detection of signal in the spiked rectangular models. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5158–5167. PMLR, 18–24 Jul 2021.
  19. Robert W Keener. Theoretical statistics: Topics for a core course. Springer, 2010.
  20. Offline reinforcement learning with implicit q-learning. CoRR, abs/2110.06169, 2021. URL https://arxiv.org/abs/2110.06169.
  21. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  22. Maxmin q-learning: Controlling the estimation bias of q-learning. In International Conference on Learning Representations, 2020.
  23. Lucien LeCam. Locally asymptotically normal families of distributions. Univ. California Publ. Statist., 3:37–98, 1960.
  24. Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 6131–6141. PMLR, 18–24 Jul 2021.
  25. The dynamics of learning: A random matrix approach. In International Conference on Machine Learning, pages 3072–3081. PMLR, 2018.
  26. M.L. Mehta. Random matrices and the statistical theory of energy levels, 1967.
  27. Karl Pearson. X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302):157–175, 1900.
  28. Mastering atari, go, chess and shogi by planning with a learned model. CoRR, abs/1911.08265, 2019.
  29. Random matrix theory proves that deep learning representations of gan-data behave as gaussian mixtures. In International Conference on Machine Learning, pages 8573–8582. PMLR, 2020.
  30. DNS: Determinantal point process based neural network sampler for ensemble reinforcement learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 19731–19746. PMLR, 17–23 Jul 2022a.
  31. Maximizing ensemble diversity in deep reinforcement learning. In International Conference on Learning Representations, 2022b.
  32. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  33. Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  34. Eugene P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics, 62(3):548–564, 1955. ISSN 0003486X.

Summary

We haven't generated a summary for this paper yet.