Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning (2403.00514v2)
Abstract: Recent advancements in off-policy Reinforcement Learning (RL) have significantly improved sample efficiency, primarily due to the incorporation of various forms of regularization that enable more gradient update steps than traditional agents. However, many of these techniques have been tested in limited settings, often on tasks from single simulation benchmarks and against well-known algorithms rather than a range of regularization approaches. This limits our understanding of the specific mechanisms driving RL improvements. To address this, we implemented over 60 different off-policy agents, each integrating established regularization techniques from recent state-of-the-art algorithms. We tested these agents across 14 diverse tasks from 2 simulation benchmarks, measuring training metrics related to overestimation, overfitting, and plasticity loss -- issues that motivate the examined regularization techniques. Our findings reveal that while the effectiveness of a specific regularization setup varies with the task, certain combinations consistently demonstrate robust and superior performance. Notably, a simple Soft Actor-Critic agent, appropriately regularized, reliably finds a better-performing policy within the training regime, which previously was achieved mainly through model-based approaches.
- Loss of plasticity in continual deep reinforcement learning. arXiv preprint arXiv: 2303.07507, 2023.
- Critical learning periods in deep neural networks. arXiv preprint arXiv: 1711.08856, 2017.
- Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320, 2021.
- What matters in on-policy reinforcement learning? a large-scale empirical study. In ICLR 2021-Ninth International Conference on Learning Representations, 2021.
- On warm-starting neural network training. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 3884–3894. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/288cd2567953f06e460a33951f55daaf-Paper.pdf.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Efficient online reinforcement learning with offline data. arXiv preprint arXiv: 2302.02948, 2023.
- Towards deeper deep reinforcement learning with spectral normalization. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 8242–8255, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/4588e674d3f0faf985047d4c3f13ed0d-Abstract.html.
- Large scale gan training for high fidelity natural image synthesis. International Conference on Learning Representations, 2018.
- Learning pessimism for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 6971–6979, 2023.
- Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations, 2020.
- Better exploration with optimistic actor critic. Advances in Neural Information Processing Systems, 32, 2019.
- Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414—419, February 2022. ISSN 0028-0836. doi: 10.1038/s41586-021-04301-9.
- Continual backprop: Stochastic gradient descent with persistent randomness. arXiv preprint arXiv: 2108.06325, 2021.
- Sample-efficient reinforcement learning by breaking the replay ratio barrier. In The Eleventh International Conference on Learning Representations, 2022.
- Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv: 2010.01412, 2020.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018.
- Spectral normalisation for deep reinforcement learning: an optimisation perspective. In International Conference on Machine Learning, pp. 3734–3744. PMLR, 2021.
- Reinforcement learning with deep energy-based policies. In International conference on machine learning, pp. 1352–1361. PMLR, 2017.
- Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Temporal difference learning for model predictive control. In International Conference on Machine Learning, PMLR, 2022.
- Dropout q-functions for doubly efficient reinforcement learning. In International Conference on Learning Representations, 2021.
- When to trust your model: Model-based policy optimization. Advances in Neural Information Processing Systems, 32, 2019.
- Implicit under-parameterization inhibits data-efficient deep reinforcement learning. arXiv preprint arXiv: 2010.14498, 2020.
- Maintaining plasticity in continual learning via regenerative regularization. arXiv preprint arXiv: 2308.11958, 2023.
- Plastic: Improving input and label plasticity for sample efficient reinforcement learning. NEURIPS, 2023.
- Efficient deep reinforcement learning requires regulating overfitting. In The Eleventh International Conference on Learning Representations, 2022.
- Regularization matters in policy optimization-an empirical study on continuous control. In International Conference on Learning Representations, 2020.
- Decoupled weight decay regularization. International Conference on Learning Representations, 2017.
- Understanding and preventing capacity loss in reinforcement learning. International Conference on Learning Representations, 2022. doi: 10.48550/arXiv.2204.09560.
- Understanding plasticity in neural networks. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 23190–23211. PMLR, 23-29 Jul 2023.
- Spectral normalization for generative adversarial networks. International Conference on Learning Representations, 2018.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Tactical optimism and pessimism for deep reinforcement learning. Advances in Neural Information Processing Systems, 34:12849–12863, 2021.
- The primacy bias in deep reinforcement learning. In International conference on machine learning, pp. 16828–16847. PMLR, 2022.
- Deep reinforcement learning with plasticity injection. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023. URL https://openreview.net/forum?id=O9cJADBZT1.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv: 1912.06680, 2019.
- Puterman, M. L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Bigger, better, faster: Human-level atari with human-level efficiency. In International Conference on Machine Learning, pp. 30365–30380. PMLR, 2023.
- Deterministic policy gradient algorithms. In International conference on machine learning, pp. 387–395. PMLR, 2014.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. arXiv preprint arXiv:2208.07860, 2022.
- The dormant neuron phenomenon in deep reinforcement learning. International Conference on Machine Learning, 2023. doi: 10.48550/arXiv.2302.12902.
- Sutton, R. The bitter lesson. Incomplete Ideas (blog), 13(1), 2019.
- Reinforcement learning: An introduction. MIT press, 2018.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Issues in using function approximation for reinforcement learning. In Proceedings of the 1993 connectionist models summer school, pp. 255–263. Psychology Press, 2014.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Kaelbling, L. P., Kragic, D., and Sugiura, K. (eds.), 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, volume 100 of Proceedings of Machine Learning Research, pp. 1094–1100. PMLR, 2019. URL http://proceedings.mlr.press/v100/yu20a.html.
- Self-attention generative adversarial networks. arXiv preprint arXiv: 1805.08318, 2018.
- Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pp. 1433–1438. Chicago, IL, USA, 2008.