Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Directions of Curvature as an Explanation for Loss of Plasticity (2312.00246v4)

Published 30 Nov 2023 in cs.LG

Abstract: Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience. Despite being empirically observed in several problem settings, little is understood about the mechanisms that lead to loss of plasticity. In this paper, we offer a consistent explanation for loss of plasticity: Neural networks lose directions of curvature during training and that loss of plasticity can be attributed to this reduction in curvature. To support such a claim, we provide a systematic investigation of loss of plasticity across continual learning tasks using MNIST, CIFAR-10 and ImageNet. Our findings illustrate that loss of curvature directions coincides with loss of plasticity, while also showing that previous explanations are insufficient to explain loss of plasticity in all settings. Lastly, we show that regularizers which mitigate loss of plasticity also preserve curvature, motivating a simple distributional regularizer that proves to be effective across the problem settings we considered.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Loss of Plasticity in Continual Deep Reinforcement Learning. CoRR, abs/2303.07507v1, 2023.
  2. On warm-starting neural network training. In Advances in Neural Information Processing Systems, 2020.
  3. Layer Normalization. CoRR, abs/1607.06450v1, 2016.
  4. Improving the convergence of back-propagation learning with second order methods. In Connectionist Models Summer School, pp.  29–37, 1988.
  5. A study on the plasticity of neural networks. CoRR, abs/2106.00042, 2021.
  6. One-dimensional empirical measures, order statistics, and Kantorovich transport distances, volume 261. American Mathematical Society, 2019.
  7. Gradient descent on neural networks typically occurs at the edge of stability. In International Conference on Learning Representations, 2021.
  8. Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. CoRR, abs/2108.06325v3, 2021.
  9. Maintaining plasticity in deep continual learning. CoRR, abs/2306.13812, 2023a.
  10. Overcoming policy collapse in deep reinforcement learning. In Sixteenth European Workshop on Reinforcement Learning, 2023b.
  11. HesScale: Scalable Computation of Hessian Diagonals. CoRR, abs/2210.11639v2, 2022.
  12. Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning. CoRR, abs/2302.03281v2, 2023.
  13. Emergent properties of the local geometry of neural loss landscapes. CoRR, abs/1910.05929, 2019.
  14. Kunihiko Fukushima. Cognitron: A self-organizing multilayered neural network. Biological cybernetics, 20(3-4):121–136, 1975.
  15. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, 2010.
  16. An empirical investigation of catastrophic forgetting in gradient-based neural networks. CoRR, abs/1312.6211.
  17. An empirical study of implicit regularization in deep offline RL. Transactions on Machine Learning Research, 2022.
  18. Gradient Descent Happens in a Tiny Subspace. 2018.
  19. Flat minima. Neural computation, 9(1):1–42, 1997.
  20. An empirical analysis of compute-optimal large language model training. Advances in Neural Information Processing Systems, 2022.
  21. Transient non-stationarity and generalisation in deep reinforcement learning. In International Conference on Learning Representations, 2021.
  22. Meta-learning representations for continual learning. Advances in Neural Information Processing Systems, 2019.
  23. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  24. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
  25. Implicit under-parameterization inhibits data-efficient deep reinforcement learning. In International Conference on Learning Representations, 2021.
  26. Maintaining Plasticity via Regenerative Regularization. CoRR, abs/2308.11958v1, 2023.
  27. Limitations of the empirical fisher approximation for natural gradient descent. In Advances in Neural Information Processing Systems, 2019.
  28. Optimal brain damage. Advances in neural information processing systems, 1989.
  29. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2010.
  30. Understanding and preventing capacity loss in reinforcement learning. In International Conference on Learning Representations, 2021.
  31. Understanding plasticity in neural networks. In International Conference on Machine Learning, 2023.
  32. Optimizing neural networks with kronecker-factored approximate curvature. In International Conference on Machine Learning, 2015.
  33. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning, 2010.
  34. The Primacy Bias in Deep Reinforcement Learning. CoRR, abs/2205.07802v1, 2022.
  35. Topmoumoute online natural gradient algorithm. Advances in neural information processing systems, 2007.
  36. A deeper look at the hessian eigenspectrum of deep neural networks and its applications to regularization. In AAAI Conference on Artificial Intelligence, 2021.
  37. Understanding and improving convolutional neural networks via concatenated rectified linear units. In international Conference on Machine Learning, 2016.
  38. The Dormant Neuron Phenomenon in Deep Reinforcement Learning. CoRR, 2023.
  39. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
  40. On the role of tracking in stationary environments. In International Conference on Machine Learning, 2007.
  41. Dissecting hessian: Understanding common structure of hessian in neural networks. CoRR, abs/2010.04261, 2020.
  42. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015.
  43. Harnessing structures for value-based planning and reinforcement learning. In International Conference on Learning Representations, 2019.
  44. Continual learning through synaptic intelligence. In International Conference on Machine Learning, 2017.
  45. On plasticity, invariance, and mutually frozen weights in sequential task learning. Advances in Neural Information Processing Systems, 2021.
  46. Liu Ziyin. Symmetry Leads to Structured Constraint of Learning. CoRR, abs/2309.16932v1, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com