2000 character limit reached
Small batch deep reinforcement learning (2310.03882v1)
Published 5 Oct 2023 in cs.LG and cs.AI
Abstract: In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.
- Deep reinforcement learning at the edge of the statistical precipice. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
- Beyond tabula rasa: Reincarnating reinforcement learning. In Thirty-Sixth Conference on Neural Information Processing Systems, 2022.
- Compute-efficient deep learning: Algorithmic trends and opportunities. Journal of Machine Learning Research, 24:1–77, 2023.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, jun 2013. doi: 10.1613/jair.3912.
- Unifying count-based exploration and intrinsic motivation. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/file/afda332245e2af431fb7b672a68b659d-Paper.pdf.
- A distributional perspective on reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 449–458, 2017.
- Jax: composable transformations of python+ numpy programs. 2018.
- Dopamine: A Research Framework for Deep Reinforcement Learning. 2018. URL http://arxiv.org/abs/1812.06110.
- MICo: Learning improved representations via sampling-based state similarity for Markov decision processes. In Advances in Neural Information Processing Systems, 2021.
- Johan Samir Obando Ceron and Pablo Samuel Castro. Revisiting rainbow: Promoting more insightful and inclusive deep reinforcement learning research. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1373–1383. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/ceron21a.html.
- Improving computational efficiency in visual reinforcement learning via stored embeddings. Advances in Neural Information Processing Systems, 34:26779–26791, 2021.
- Distributional reinforcement learning with quantile regression. In AAAI, 2018a.
- Implicit quantile networks for distributional reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1096–1105. PMLR, 2018b.
- The value-improvement path: Towards better representations for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
- Sample-efficient reinforcement learning by breaking the replay ratio barrier. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=OpC-9aBBVJe.
- Pink noise is all you need: Colored noise exploration in deep reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=hQ9V5QN27eS.
- IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures. In Proceedings of the 35th International Conference on Machine Learning), ICML’18, 2018.
- Proto-value networks: Scaling representation learning with auxiliary tasks. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=oGDKSt9JrZi.
- Revisiting fundamentals of experience replay. In International Conference on Machine Learning, pages 3061–3071. PMLR, 2020.
- Noisy networks for exploration. In Proceedings of the International Conference on Representation Learning (ICLR 2018), Vancouver (Canada), 2018.
- Spectral normalisation for deep reinforcement learning: An optimisation perspective. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 3734–3744. PMLR, 18–24 Jul 2021.
- On the computational inefficiency of large batch sizes for stochastic gradient descent. arXiv preprint arXiv:1811.12941, 2018.
- Exploration in deep reinforcement learning: From single-agent to multiagent domain. IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2023. doi: 10.1109/tnnls.2023.3236361. URL https://doi.org/10.1109%2Ftnnls.2023.3236361.
- Array programming with numpy. Nature, 585(7825):357–362, 2020.
- Rainbow: Combining Improvements in Deep Reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
- Batch size-invariance for policy optimization. Advances in Neural Information Processing Systems, 35:17086–17098, 2022.
- John D Hunter. Matplotlib: A 2d graphics environment. Computing in science & engineering, 9(03):90–95, 2007.
- Reproducibility of benchmarked deep reinforcement learning tasks for continuous control, 2017.
- Model based reinforcement learning for atari. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1xCPJHtDB.
- On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
- On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=H1oyRlYgg.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
- Implicit under-parameterization inhibits data-efficient deep reinforcement learning. In International Conference on Learning Representations, 2021a. URL https://openreview.net/forum?id=O9bnihsFfXU.
- Dr3: Value-based deep reinforcement learning requires explicit regularization. In International Conference on Learning Representations, 2021b.
- Large batch experience replay. In International Conference on Machine Learning, 2021. URL https://api.semanticscholar.org/CorpusID:238259488.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn., 8(3–4):293–321, May 1992.
- Understanding plasticity in neural networks. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 23190–23211. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/lyle23b.html.
- Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. J. Artif. Int. Res., 61(1):523–562, jan 2018. ISSN 1076-9757.
- Revisiting small batch training for deep neural networks. ArXiv, abs/1804.07612, 2018.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015.
- The primacy bias in deep reinforcement learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 16828–16847. PMLR, 17–23 Jul 2022.
- Travis E. Oliphant. Python for scientific computing. Computing in Science & Engineering, 9(3):10–20, 2007. doi: 10.1109/MCSE.2007.58.
- Parameter space noise for exploration. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=ByBAl2eAZ.
- Data-efficient reinforcement learning with self-predictive representations. In International Conference on Learning Representations, 2020.
- Bigger, better, faster: Human-level atari with human-level efficiency. In International Conference on Machine Learning, pages 30365–30380. PMLR, 2023.
- Measuring the effects of data parallelism on neural network training. Journal of Machine Learning Research, 20(112):1–49, 2019. URL http://jmlr.org/papers/v20/18-789.html.
- The dormant neuron phenomenon in deep reinforcement learning. In ICML, 2023.
- Accelerated methods for deep reinforcement learning. CoRR, abs/1803.02811, 2018. URL http://arxiv.org/abs/1803.02811.
- Reinforcement learning: An introduction. MIT press, 2018.
- On bonus based exploration methods in the arcade learning environment. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=BJewlyStDr.
- When to use parametric models in reinforcement learning? In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/1b742ae215adf18b75449c6e272fd92d-Paper.pdf.
- Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
- Q-learning. Machine learning, 8(3):279–292, 1992.
- The general inefficiency of batch training for gradient descent learning. Neural Networks, 16(10):1429–1451, 2003. ISSN 0893-6080. doi: https://doi.org/10.1016/S0893-6080(03)00138-2. URL https://www.sciencedirect.com/science/article/pii/S0893608003001382.
- Remember more by recalling less: Investigating the role of batch size in continual learning with experience replay (student abstract). 35(18):15923–15924, 2021.
- Penalizing gradient norm for efficiently improving generalization in deep learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 26982–26992. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/zhao22i.html.