Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Impact of Geometric Complexity on Neural Collapse in Transfer Learning (2405.15706v3)

Published 24 May 2024 in cs.LG

Abstract: Many of the recent remarkable advances in computer vision and LLMs can be attributed to the success of transfer learning via the pre-training of large foundation models. However, a theoretical framework which explains this empirical success is incomplete and remains an active area of research. Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics which shed light on the implicit biases underlying pre-training. In this paper, we explore the geometric complexity of a model's learned representations as a fundamental mechanism that relates these two concepts. We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse. Furthermore, we show how this effect of the geometric complexity generalizes to the neural collapse of new classes as well, thus encouraging better performance on downstream tasks, particularly in the few-shot setting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. A simple proof of the poincaré inequality for a large class of probability measures. 2008.
  2. Hidden progress in deep learning: Sgd learns parities near the computational limit. Advances in Neural Information Processing Systems, 35:21750–21764, 2022.
  3. Implicit gradient regularization. In International Conference on Learning Representations, 2021.
  4. Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML workshop on unsupervised and transfer learning, pages 17–36. JMLR Workshop and Conference Proceedings, 2012.
  5. Implicit regularization for deep neural networks driven by an ornstein-uhlenbeck like process. In Annual Conference Computational Learning Theory, 2019.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. A universal law of robustness via isoperimetry. Journal of the ACM, 70(2):1–18, 2023.
  8. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021.
  9. Rich Caruana. Learning many related tasks at the same time with backpropagation. Advances in neural information processing systems, 7, 1994.
  10. On the implicit bias of adam. arXiv:2309.00079, 2023.
  11. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005, 2013.
  12. The loss surfaces of multilayer networks. In Artificial intelligence and statistics, pages 192–204. PMLR, 2015.
  13. Label noise SGD provably prefers flat global minimizers. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, NeurIPS 2021, 2021.
  14. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  15. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  16. The geometric occam’s razor implicit in deep learning. NeurIPS, OPT2021, 2021.
  17. Why neural networks find simple solutions: the many regularizers of geometric complexity. Advances in Neural Information Processing Systems, 35:2333–2349, 2022.
  18. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pages 1019–1028. PMLR, 2017.
  19. Lawrence C Evans. Partial differential equations, volume 19. American Mathematical Society, 2022.
  20. A concise review of transfer learning. CoRR, abs/2104.02144, 2021. URL https://arxiv.org/abs/2104.02144.
  21. Herbert Federer. Geometric measure theory. Springer, 2014.
  22. On the role of neural collapse in transfer learning. In ICLR, 2022.
  23. Towards demystifying the generalization behaviors when neural collapse emerges. arXiv preprint arXiv:2310.08358, 2023.
  24. Stochastic training is not necessary for generalization. In ICLR 2022.
  25. Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent. ICLR, 2023.
  26. Implicit bias of gradient descent on linear convolutional networks. Advances in neural information processing systems, 31, 2018.
  27. Neural collapse under MSE loss: Proximity to and dynamics on the central path. In ICLR, 2022.
  28. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  29. Flat minima. Neural computation, 9(1):1–42, 1997.
  30. Robust learning with jacobian regularization. 2020. URL https://arxiv.org/abs/1908.02729.
  31. Vignesh Kothapalli. Neural collapse: A review on modelling principles and generalization. arXiv preprint arXiv:2206.04041, 2022.
  32. Learning multiple layers of features from tiny images. 2009.
  33. Meta-learning with differentiable convex optimization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10657–10665, 2019.
  34. Principled and efficient transfer learning of deep models via neural collapse. arXiv preprint arXiv:2212.12206, 2022.
  35. Same pre-training loss, better downstream: Implicit bias matters for language models. In ICML, 2023a.
  36. Inducing neural collapse in deep long-tailed learning. In AISTATS, 2023b.
  37. The sobolev regularization effect of stochastic gradient descent. 2021a. URL https://arxiv.org/abs/2105.13462.
  38. On linear stability of sgd and input-smoothness of neural networks. In NeurIPS, 2021b.
  39. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. Journal of Machine Learning Research, 22(165):1–73, 2021.
  40. Foundations of machine learning. MIT press, 2018.
  41. A margin-based multiclass generalization bound via geometric complexity. ICML, TAGML Workshop, 2023.
  42. Behnam Neyshabur. Implicit regularization in deep learning. arXiv preprint arXiv:1709.01953, 2017.
  43. In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614, 2014.
  44. Sensitivity and generalization in neural networks: an empirical study. In ICLR, 2018.
  45. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40), 2020.
  46. Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177, 2022.
  47. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020.
  48. Zero-shot text-to-image generation. In International conference on machine learning, pages 8821–8831. Pmlr, 2021.
  49. Optimization as a model for few-shot learning. In International conference on learning representations, 2016.
  50. Sparse modular activation for efficient sequence modeling. In NeurIPS, 2023.
  51. A case for new neural network smoothness constraints. In Proceedings on "I Can’t Believe It’s Not Better!" at NeurIPS Workshops, volume 137 of Proceedings of Machine Learning Research, 2020.
  52. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
  53. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  54. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
  55. On the origin of implicit regularization in stochastic gradient descent. In International Conference on Learning Representations, 2021.
  56. Robust large margin deep neural networks. IEEE Transactions on Signal Processing, 65:4265–4280, 2017.
  57. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
  58. Efficient approximation of jacobian matrices involving a non-uniform fast fourier transform (nufft). IEEE transactions on computational imaging, 9:43–54, 2023.
  59. How far pre-trained models are from neural collapse on the target dataset informs their transferability. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5549–5558, 2023.
  60. A survey of transfer learning. Journal of Big Data, 3, 05 2016. doi: 10.1186/s40537-016-0043-6.
  61. The marginal value of adaptive gradient methods in machine learning. Advances in neural information processing systems, 30, 2017.
  62. How transferable are features in deep neural networks? Advances in neural information processing systems, 27, 2014.
  63. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Michael Munn (16 papers)
  2. Benoit Dherin (24 papers)
  3. Javier Gonzalvo (7 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com