Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay (2309.04644v3)

Published 9 Sep 2023 in cs.LG

Abstract: Neural Collapse (NC) is a geometric structure recently observed at the terminal phase of training deep neural networks, which states that last-layer feature vectors for the same class would "collapse" to a single point, while features of different classes become equally separated. We demonstrate that batch normalization (BN) and weight decay (WD) critically influence the emergence of NC. In the near-optimal loss regime, we establish an asymptotic lower bound on the emergence of NC that depends only on the WD value, training loss, and the presence of last-layer BN. Our experiments substantiate theoretical insights by showing that models demonstrate a stronger presence of NC with BN, appropriate WD values, lower loss, and lower last-layer feature norm. Our findings offer a novel perspective in studying the role of BN and WD in shaping neural network features.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Nearest class-center simplification through intermediate layers. In Proceedings of Topological, Algebraic, and Geometric Learning Workshops, volume 196 of PMLR, pages 37–47, 2022.
  2. Reverse engineering self-supervised learning, 2023.
  3. On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers. In Joan Bruna, Jan Hesthaven, and Lenka Zdeborova, editors, Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, volume 145 of Proceedings of Machine Learning Research, pages 270–290. PMLR, 16–19 Aug 2022. URL https://proceedings.mlr.press/v145/e22b.html.
  4. On the implicit bias towards minimal depth of deep neural networks, 2022a. URL https://arxiv.org/abs/2202.09028.
  5. On the role of neural collapse in transfer learning, 2022b.
  6. Neural collapse under mse loss: Proximity to and dynamics on the central path. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=w1UbdvWH_R3.
  7. Deep residual learning for image recognition, 2015.
  8. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456, 2015.
  9. An unconstrained layer-peeled perspective on neural collapse, 2022.
  10. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
  11. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  12. Neural collapse under cross-entropy loss. Applied and Computational Harmonic Analysis, 59:224–241, 2022. ISSN 1063-5203. doi: https://doi.org/10.1016/j.acha.2021.12.011. URL https://www.sciencedirect.com/science/article/pii/S1063520321001123. Special Issue on Harmonic Analysis and Machine Learning.
  13. Remarks on strongly convex functions. Aequationes mathematicae, 80(1):193–199, Sep 2010. ISSN 1420-8903. doi: 10.1007/s00010-010-0043-0. URL https://doi.org/10.1007/s00010-010-0043-0.
  14. Neural collapse with unconstrained features, 2020.
  15. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020. doi: 10.1073/pnas.2015509117. URL https://www.pnas.org/doi/abs/10.1073/pnas.2015509117.
  16. Explicit regularization and implicit bias in deep network classifiers trained with the square loss, 2020.
  17. Very deep convolutional networks for large-scale image recognition, 2015.
  18. Deep neural collapse is provably optimal for the deep unconstrained features model, 2023.
  19. Extended unconstrained features model for exploring deep neural collapse, 2022.
  20. Neural collapse with normalized features: A geometric analysis over the riemannian manifold. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 11547–11560. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/4b3cc0d1c897ebcf71aca92a4a26ac83-Paper-Conference.pdf.
  21. On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features. arXiv preprint arXiv:2203.01238, 2022.
  22. A geometric analysis of neural collapse with unconstrained features. 2021.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets