Maintaining Plasticity in Continual Learning via Regenerative Regularization (2308.11958v3)
Abstract: In continual learning, plasticity refers to the ability of an agent to quickly adapt to new information. Neural networks are known to lose plasticity when processing non-stationary data streams. In this paper, we propose L2 Init, a simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters. This is very similar to standard L2 regularization (L2), the only difference being that L2 regularizes toward the origin. L2 Init is simple to implement and requires selecting only a single hyper-parameter. The motivation for this method is the same as that of methods that reset neurons or parameter values. Intuitively, when recent losses are insensitive to particular parameters, these parameters should drift toward their initial values. This prepares parameters to adapt quickly to new tasks. On problems representative of different types of nonstationarity in continual supervised learning, we demonstrate that L2 Init most consistently mitigates plasticity loss compared to previously proposed approaches.
- Loss of plasticity in continual deep reinforcement learning. arXiv preprint arXiv:2303.07507, 2023.
- Critical learning periods in deep neural networks. arXiv preprint arXiv:1711.08856, 2017.
- On warm-starting neural network training. Advances in neural information processing systems, 33:3884–3894, 2020.
- Does the Adam optimizer exacerbate catastrophic forgetting? arXiv preprint arXiv:2102.07686, 2021.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Online continual learning with natural distribution shifts: An empirical study with visual data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8281–8290, 2021.
- Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), pages 532–547, 2018.
- Continual backprop: Stochastic gradient descent with persistent randomness. arXiv preprint arXiv:2108.06325, 2021.
- Maintaining plasticity in deep continual learning. arXiv preprint arXiv:2306.13812, 2023.
- Real-time evaluation in online continual learning: A new paradigm. arXiv preprint arXiv:2302.01047, 2023.
- An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
- An empirical study of implicit regularization in deep offline RL. arXiv preprint arXiv:2207.02099, 2022.
- Transient non-stationarity and generalisation in deep reinforcement learning. arXiv preprint arXiv:2006.05826, 2020.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Implicit under-parameterization inhibits data-efficient deep reinforcement learning. arXiv preprint arXiv:2010.14498, 2020.
- Bad global minima exist and SGD can reach them. Advances in Neural Information Processing Systems, 33:8543–8552, 2020.
- Understanding and preventing capacity loss in reinforcement learning. arXiv preprint arXiv:2204.09560, 2022.
- Understanding plasticity in neural networks. arXiv preprint arXiv:2303.01486, 2023.
- Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- The primacy bias in deep reinforcement learning. In International conference on machine learning, pages 16828–16847. PMLR, 2022.
- Deep reinforcement learning with plasticity injection. arXiv preprint arXiv:2305.15555, 2023.
- Online continual learning without the storage constraint. arXiv preprint arXiv:2305.09253, 2023.
- Understanding and improving convolutional neural networks via concatenated rectified linear units. In international conference on machine learning, pages 2217–2225. PMLR, 2016.
- The dormant neuron phenomenon in deep reinforcement learning. arXiv preprint arXiv:2302.12902, 2023.
- On plasticity, invariance, and mutually frozen weights in sequential task learning. Advances in Neural Information Processing Systems, 34:12386–12399, 2021.
- The negative pretraining effect in sequential deep learning and three ways to fix it. 2020.