Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Rank Collapse: Weight Decay and Small Within-Class Variability Yield Low-Rank Bias (2402.03991v1)

Published 6 Feb 2024 in cs.LG, cs.NA, math.NA, and stat.ML

Abstract: Recent work in deep learning has shown strong empirical and theoretical evidence of an implicit low-rank bias: weight matrices in deep networks tend to be approximately low-rank and removing relatively small singular values during training or from available trained models may significantly reduce model size while maintaining or even improving model performance. However, the majority of the theoretical investigations around low-rank bias in neural networks deal with oversimplified deep linear networks. In this work, we consider general networks with nonlinear activations and the weight decay parameter, and we show the presence of an intriguing neural rank collapse phenomenon, connecting the low-rank bias of trained networks with networks' neural collapse properties: as the weight decay parameter grows, the rank of each layer in the network decreases proportionally to the within-class variability of the hidden-space embeddings of the previous layers. Our theoretical findings are supported by a range of experimental evaluations illustrating the phenomenon.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Implicit regularization in deep matrix factorization. In Conference on Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  2. Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers. Information and Inference: A Journal of the IMA, 11(1):307–353, 2022.
  3. Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks. In Proceedings of the 35th International Conference on Machine Learning, pages 521–530. PMLR, 2018.
  4. Gradient descent for deep matrix factorization: Dynamics and implicit bias towards low rank. Applied and Computational Harmonic Analysis, 68:101595, 2024.
  5. L. Deng. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  6. Rank diminishing in deep neural networks. In Conference on Neural Information Processing Systems (NeurIPS), volume 35, pages 33054–33065, 2022.
  7. Characterizing the implicit bias of regularized SGD in rank minimization, 2023.
  8. The implicit bias of depth: How incremental learning drives generalization. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1lj0nNFwB.
  9. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  10. Matrix Analysis. Cambridge University Press, 2012.
  11. The low-rank simplicity bias in deep networks. Transactions on Machine Learning Research, 2022.
  12. Initialization and regularization of factorized neural layers. In International Conference on Learning Representations, 2021.
  13. T. Le and S. Jegelka. Training invariances and the low-rank phenomenon: beyond linear networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=XEW8CQgArno.
  14. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989.
  15. Learning filter basis for convolutional neural network compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  16. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. 2018.
  17. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020.
  18. Low tensor rank learning of neural dynamics. In Conference on Neural Information Processing Systems (NeurIPS), 2023.
  19. Feature learning in deep classifiers through intermediate neural collapse. ICML, 2023.
  20. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In International Conference on Learning Representations, 2014.
  21. Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations. NeurIps, 2022.
  22. The interplay between randomness and structure during learning in RNNs. In Conference on Neural Information Processing Systems (NeurIPS), pages 13352–13362, 2020.
  23. The truth is in there: Improving reasoning in language models with layer-selective rank reduction. In International Conference on Learning Representations (ICLR), 2024.
  24. Analytic insights into structure and rank of neural network hessian maps. In Conference on Neural Information Processing Systems (NeurIPS), volume 34, pages 23914–23927, 2021.
  25. Pure and spurious critical points: a geometric study of linear networks. arXiv preprint arXiv:1910.01671, 2019.
  26. Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
  27. R. Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018.
  28. Pufferfish: Communication-efficient models at no extra cost. Proceedings of Machine Learning and Systems, 3:365–386, 2021.
  29. A geometric analysis of neural collapse with unconstrained features. Conference on Neural Information Processing Systems (NeurIPS), 34:29820–29834, 2021.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets