Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Unconstrained Features: Neural Collapse for Shallow Neural Networks with General Data (2409.01832v2)

Published 3 Sep 2024 in stat.ML and cs.LG

Abstract: Neural collapse (NC) is a phenomenon that emerges at the terminal phase of the training (TPT) of deep neural networks (DNNs). The features of the data in the same class collapse to their respective sample means and the sample means exhibit a simplex equiangular tight frame (ETF). In the past few years, there has been a surge of works that focus on explaining why the NC occurs and how it affects generalization. Since the DNNs are notoriously difficult to analyze, most works mainly focus on the unconstrained feature model (UFM). While the UFM explains the NC to some extent, it fails to provide a complete picture of how the network architecture and the dataset affect NC. In this work, we focus on shallow ReLU neural networks and try to understand how the width, depth, data dimension, and statistical property of the training dataset influence the neural collapse. We provide a complete characterization of when the NC occurs for two or three-layer neural networks. For two-layer ReLU neural networks, a sufficient condition on when the global minimizer of the regularized empirical risk function exhibits the NC configuration depends on the data dimension, sample size, and the signal-to-noise ratio in the data instead of the network width. For three-layer neural networks, we show that the NC occurs as long as the first layer is sufficiently wide. Regarding the connection between NC and generalization, we show the generalization heavily depends on the SNR (signal-to-noise ratio) in the data: even if the NC occurs, the generalization can still be bad provided that the SNR in the data is too low. Our results significantly extend the state-of-the-art theoretical analysis of the N C under the UFM by characterizing the emergence of the N C under shallow nonlinear networks and showing how it depends on data properties and network architecture.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. On the implicit geometry of cross-entropy parameterizations for label-imbalanced data. In International Conference on Artificial Intelligence and Statistics, pages 10815–10838. PMLR, 2023.
  2. G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
  3. Neural collapse in deep linear network: From balanced to imbalanced data. arXiv preprint arXiv:2301.00437, 2023.
  4. Neural collapse for cross-entropy class-imbalanced learning with unconstrained ReLU feature model. arXiv preprint arXiv:2401.02058, 2024.
  5. Neural collapse in deep linear networks: from balanced to imbalanced data. In Proceedings of the 40th International Conference on Machine Learning, pages 6873–6947, 2023.
  6. Gradient descent provably optimizes over-parameterized neural networks. In International Conference on Learning Representations, 2018.
  7. W. E and S. Wojtowytsch. On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers. In Mathematical and Scientific Machine Learning, pages 270–290. PMLR, 2022.
  8. Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training. Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021.
  9. On the role of neural collapse in transfer learning. In International Conference on Learning Representations, 2021.
  10. C. Garrod and J. P. Keating. Unifying low dimensional observations in deep learning through the deep linear unconstrained feature model. arXiv preprint arXiv:2404.06106, 2024.
  11. Neural collapse under MSE loss: Proximity to and dynamics on the central path. In International Conference on Learning Representations, 2021.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  13. W. Hong and S. Ling. Neural collapse for unconstrained feature model under cross-entropy loss with imbalanced data. Journal of Machine Learning Research, 25(192):1–48, 2024.
  14. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
  15. Limitations of neural collapse for understanding generalization in deep learning. arXiv preprint arXiv:2202.08384, 2022.
  16. An unconstrained layer-peeled perspective on neural collapse. In International Conference on Learning Representations, 2022.
  17. Neural collapse: A review on modelling principles and generalization. arXiv preprint arXiv:2206.04041, 2022.
  18. Deep learning. Nature, 521(7553):436–444, 2015.
  19. Principled and efficient transfer learning of deep models via neural collapse. arXiv preprint arXiv:2212.12206, 2022.
  20. Y. Li and Y. Yuan. Convergence analysis of two-layer neural networks with ReLU activation. Advances in neural information processing systems, 30, 2017.
  21. J. Lu and S. Steinerberger. Neural collapse under cross-entropy loss. Applied and Computational Harmonic Analysis, 59:224–241, 2022.
  22. A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33):E7665–E7671, 2018.
  23. Neural collapse with unconstrained features. Sampling Theory, Signal Processing, and Data Analysis, 20(2):11, 2022.
  24. Memorization-dilation: Modeling neural collapse under noise. In The Eleventh International Conference on Learning Representations, 2023.
  25. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020.
  26. T. Poggio and Q. Liao. Explicit regularization and implicit bias in deep network classifiers trained with the square loss. arXiv preprint arXiv:2101.00072, 2021.
  27. A. Rangamani and A. Banburski-Fahey. Neural collapse in deep homogeneous classifiers and the role of weight decay. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4243–4247. IEEE, 2022.
  28. G. Rotskoff and E. Vanden-Eijnden. Trainability and accuracy of artificial neural networks: An interacting particle system approach. Communications on Pure and Applied Mathematics, 75(9):1889–1935, 2022.
  29. Neural (tangent kernel) collapse. volume 36, 2024.
  30. The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822–2878, 2018.
  31. Deep neural collapse is provably optimal for the deep unconstrained features model. Advances in Neural Information Processing Systems, 36, 2024.
  32. Imbalance trouble: Revisiting neural-collapse geometry. Advances in Neural Information Processing Systems, 35:27225–27238, 2022.
  33. T. Tirer and J. Bruna. Extended unconstrained features model for exploring deep neural collapse. In International Conference on Machine Learning, pages 21478–21505. PMLR, 2022.
  34. Perturbation analysis of neural collapse. In International Conference on Machine Learning, pages 34301–34329. PMLR, 2023.
  35. R. Vershynin. High-dimensional Probability: An Introduction with Applications in Data Science, volume 47. Cambridge University Press, 2018.
  36. M. J. Wainwright. High-dimensional Statistics: A Non-asymptotic Viewpoint, volume 48. Cambridge University Press, 2019.
  37. Are neurons actually collapsed? on the fine-grained structure in neural representations. In International Conference on Machine Learning, pages 39453–39487. PMLR, 2023.
  38. Neural collapse with normalized features: A geometric analysis over the riemannian manifold. Advances in Neural Information Processing Systems, 35:11547–11560, 2022.
  39. On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features. In International Conference on Machine Learning, pages 27179–27202. PMLR, 2022.
  40. A geometric analysis of neural collapse with unconstrained features. Advances in Neural Information Processing Systems, 34:29820–29834, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Wanli Hong (4 papers)
  2. Shuyang Ling (22 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com