Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 167 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 187 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features (2410.23169v1)

Published 30 Oct 2024 in cs.LG

Abstract: Modern deep neural networks have been observed to exhibit a simple structure in their final layer features and weights, commonly referred to as neural collapse. This phenomenon has also been noted in layers beyond the final one, an extension known as deep neural collapse. Recent findings indicate that such a structure is generally not optimal in the deep unconstrained feature model, an approximation of an expressive network. This is attributed to a low-rank bias induced by regularization, which favors solutions with lower-rank than those typically associated with deep neural collapse. In this work, we extend these observations to the cross-entropy loss and analyze how the low-rank bias influences various solutions. Additionally, we explore how this bias induces specific structures in the singular values of the weights at global optima. Furthermore, we examine the loss surface of these models and provide evidence that the frequent observation of deep neural collapse in practice, despite its suboptimality, may result from its higher degeneracy on the loss surface.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Vardan Papyan. Traces of class/cross-class structure pervade deep learning spectra. Journal of Machine Learning Research, 21(252):1–64, 2020.
  2. Gradient descent happens in a tiny subspace. ArXiv, abs/1812.04754, 2018.
  3. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences of the United States of America, 117:24652 – 24663, 2020.
  4. Neural collapse: A review on modelling principles and generalization. Trans. Mach. Learn. Res., 2023, 2022.
  5. Neural collapse with unconstrained features. Sampling Theory, Signal Processing, and Data Analysis, 20, 2020.
  6. An unconstrained layer-peeled perspective on neural collapse. In International Conference on Learning Representations, 2022.
  7. A geometric analysis of neural collapse with unconstrained features. Advances in Neural Information Processing Systems, 34:29820–29834, 2021.
  8. On the optimization landscape of neural collapse under MSE loss: Global optimality with unconstrained features. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 27179–27202. PMLR, 17–23 Jul 2022.
  9. Neural collapse with normalized features: A geometric analysis over the riemannian manifold. In Advances in Neural Information Processing Systems, volume 35, pages 11547–11560. Curran Associates, Inc., 2022.
  10. Are all losses created equal: A neural collapse perspective. In Advances in Neural Information Processing Systems, volume 35, pages 31697–31710. Curran Associates, Inc., 2022.
  11. Deep neural collapse is provably optimal for the deep unconstrained features model. In Advances in Neural Information Processing Systems, volume 36, pages 52991–53024. Curran Associates, Inc., 2023.
  12. A law of data separation in deep learning. Proceedings of the National Academy of Sciences of the United States of America, 120, 2022.
  13. Feature learning in deep classifiers through intermediate neural collapse. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 28729–28745. PMLR, 23–29 Jul 2023.
  14. Neural collapse in the intermediate hidden layers of classification neural networks. ArXiv, abs/2308.02760, 2023.
  15. Neural collapse in deep linear networks: From balanced to imbalanced data. In International Conference on Machine Learning, 2023.
  16. Extended unconstrained features model for exploring deep neural collapse. In International Conference on Machine Learning, 2022.
  17. Neural collapse versus low-rank bias: Is deep neural collapse really optimal? arXiv preprint arXiv:2405.14468, 2024.
  18. Unifying low dimensional observations in deep learning through the deep linear unconstrained feature model. arXiv preprint arXiv:2404.06106, 2024.
  19. Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training. Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021.
  20. Neural collapse under MSE loss: Proximity to and dynamics on the central path. In International Conference on Learning Representations, 2022.
  21. Generalized neural collapse for a large number of classes. arXiv preprint arXiv:2310.05351, 2023.
  22. Neural collapse for cross-entropy class-imbalanced learning with unconstrained relu feature model. arXiv preprint arXiv:2401.02058, 2024.
  23. Imbalance trouble: Revisiting neural-collapse geometry. Advances in Neural Information Processing Systems, 35:27225–27238, 2022.
  24. Beyond unconstrained features: Neural collapse for shallow neural networks with general data. arXiv preprint arXiv:2409.01832, 2024.
  25. Kernel vs. kernel: Exploring how the data structure affects neural collapse. arXiv preprint arXiv:2406.02105, 2024.
  26. Average gradient outer product as a mechanism for deep neural collapse. arXiv preprint arXiv:2402.13728, 2024.
  27. On the robustness of neural collapse and the neural collapse of robustness. arXiv preprint arXiv:2311.07444, 2023.
  28. On the role of neural collapse in transfer learning. arXiv preprint arXiv:2112.15121, 2021.
  29. Principled and efficient transfer learning of deep models via neural collapse. arXiv preprint arXiv:2212.12206, 2022.
  30. Limitations of neural collapse for understanding generalization in deep learning. arXiv preprint arXiv:2202.08384, 2022.
  31. Pursuing feature separation based on neural collapse for out-of-distribution detection. arXiv preprint arXiv:2405.17816, 2024.
  32. Epa: Neural collapse inspired robust out-of-distribution detector. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6515–6519. IEEE, 2024.
  33. Residual alignment: uncovering the mechanisms of residual networks. Advances in Neural Information Processing Systems, 36, 2024.
  34. Linguistic collapse: Neural collapse in (large) language models. arXiv preprint arXiv:2405.17767, 2024.
  35. Wide neural networks trained with weight decay provably exhibit neural collapse. arXiv preprint arXiv:2410.04887, 2024.
  36. Implicit regularization in deep matrix factorization. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  37. Gradient descent for deep matrix factorization: Dynamics and implicit bias towards low rank. ArXiv, abs/2011.13772, 2020.
  38. Low-rank matrix recovery via regularized nuclear norm minimization. arXiv: Numerical Analysis, 2019.
  39. Low-rank matrix recovery via efficient schatten p-norm minimization. Proceedings of the AAAI Conference on Artificial Intelligence, 2012.
  40. Implicit bias of sgd in l2-regularized linear dnns: One-way jumps from high to low rank. ArXiv, abs/2305.16038, 2023.
  41. The role of linear layers in nonlinear interpolating networks. arXiv preprint arXiv:2202.00856, 2022.
  42. Linear neural network layers promote learning single-and multiple-index models. arXiv preprint arXiv:2305.15598, 2023.
  43. Representation costs of linear neural networks: Analysis and design. In Advances in Neural Information Processing Systems, volume 34, pages 26884–26896. Curran Associates, Inc., 2021.
  44. Gradient descent aligns the layers of deep linear networks. ArXiv, abs/1810.02032, 2018.
  45. Sgd and weight decay provably induce a low-rank bias in neural networks. arXiv preprint arXiv:2206.05794, 2022.
  46. The low-rank simplicity bias in deep networks. Trans. Mach. Learn. Res., 2023, 2021.
  47. Training invariances and the low-rank phenomenon: beyond linear networks. ArXiv, abs/2201.11968, 2022.
  48. Neural rank collapse: Weight decay and small within-class variability yield low-rank bias. arXiv:2402.03991, 2024.
  49. Explorations on high dimensional landscapes. arXiv preprint arXiv:1412.6615, 2014.
  50. The loss surfaces of multilayer networks. In Artificial intelligence and statistics, pages 192–204. PMLR, 2015.
  51. The loss surfaces of neural networks with general activation functions. Journal of Statistical Mechanics: Theory and Experiment, 2021(6):064001, 2021.
  52. A spin glass model for the loss surfaces of generative adversarial networks. Journal of Statistical Physics, 186(2):29, 2022.
  53. Eigenvalues of the hessian in deep learning: Singularity and beyond. arXiv preprint arXiv:1611.07476, 2016.
  54. Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454, 2017.
  55. Vardan Papyan. The full spectrum of deep net hessians at scale: Dynamics with sample size. ArXiv, abs/1811.07062, 2018.
  56. Vardan Papyan. Measurements of three-level hierarchical structure in the outliers in the spectrum of deepnet hessians. arXiv preprint arXiv:1901.08244, 2019.
  57. Diego Granziol. Flatness is a false friend. arXiv preprint arXiv:2006.09091, 2020.
  58. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pages 1019–1028. PMLR, 2017.
  59. Unveiling the structure of wide flat minima in neural networks. Physical Review Letters, 127(27):278301, 2021.
  60. Small nonlinearities in activation functions create bad local minima in neural networks. In International Conference on Learning Representations, 2018.
  61. Spurious local minima are common in two-layer ReLU neural networks. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4433–4441. PMLR, 10–15 Jul 2018.
  62. Toward moderate overparameterization: Global convergence guarantees for training shallow neural networks. IEEE Journal on Selected Areas in Information Theory, 1(1):84–105, 2020.
  63. An improved analysis of training over-parameterized deep neural networks. Advances in neural information processing systems, 32, 2019.
  64. The global landscape of neural networks: An overview. IEEE Signal Processing Magazine, 37(5):95–108, 2020.
  65. How regularization affects the critical points in linear networks. Advances in neural information processing systems, 30, 2017.
  66. A convergence analysis of gradient descent for deep linear neural networks. arXiv preprint arXiv:1810.02281, 2018.
  67. The global optimization geometry of shallow linear neural networks. Journal of Mathematical Imaging and Vision, 62(3):279–292, 2020.
  68. Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. IEEE Transactions on Information Theory, 65(2):742–769, 2018.
  69. Thomas Laurent and James H. von Brecht. Deep linear networks with arbitrary loss: All local minima are global. In International Conference on Machine Learning, 2017.
  70. The loss surface of deep linear networks viewed through the algebraic geometry lens. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:5664–5680, 2018.
  71. Kenji Kawaguchi. Deep learning without poor local minima. Advances in neural information processing systems, 29, 2016.
  72. A unified scalable equivalent formulation for schatten quasi-norms. Mathematics, 8(8):1325, 2020.
  73. Learning rates as a function of batch size: A random matrix theory approach to neural network training. Journal of Machine Learning Research, 23(173):1–65, 2022.
  74. Hadamard powers and kernel perceptrons. Linear Algebra and its Applications, 672:93–107, 2023.
  75. Matrix Analysis. Cambridge University Press, 1985.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.