Guiding Neural Collapse: Optimising Towards the Nearest Simplex Equiangular Tight Frame (2411.01248v1)
Abstract: Neural Collapse (NC) is a recently observed phenomenon in neural networks that characterises the solution space of the final classifier layer when trained until zero training loss. Specifically, NC suggests that the final classifier layer converges to a Simplex Equiangular Tight Frame (ETF), which maximally separates the weights corresponding to each class. By duality, the penultimate layer feature means also converge to the same simplex ETF. Since this simple symmetric structure is optimal, our idea is to utilise this property to improve convergence speed. Specifically, we introduce the notion of nearest simplex ETF geometry for the penultimate layer features at any given training iteration, by formulating it as a Riemannian optimisation. Then, at each iteration, the classifier weights are implicitly set to the nearest simplex ETF by solving this inner-optimisation, which is encapsulated within a declarative node to allow backpropagation. Our experiments on synthetic and real-world architectures for classification tasks demonstrate that our approach accelerates convergence and enhances training stability.
- Trust-region methods on Riemannian manifolds. Foundations of Computational Mathematics, 7(3):303–330, 2007. doi: 10.1007/s10208-005-0179-9.
- Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008. ISBN 9780691132983. URL http://www.jstor.org/stable/j.ctt7smmk.
- Differentiable convex optimization layers. Advances in neural information processing systems, 32, 2019.
- Optnet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning, pp. 136–145. PMLR, 2017.
- On the implicit geometry of cross-entropy parameterizations for label-imbalanced data. In International Conference on Artificial Intelligence and Statistics, pp. 10815–10838. PMLR, 2023.
- Automatic Gradient Descent: Deep Learning without Hyperparameters. arXiv:2304.05187, 2023.
- Geometrical dissipation for dynamical systems. Communications in Mathematical Physics, 316:375–394, 2012.
- Hessian operators on constraint manifolds. Journal of Nonlinear Science, 25:1285–1305, 2015.
- First order optimality conditions and steepest descent algorithm on orthogonal stiefel manifolds. Optim. Lett., 13(8):1773–1791, November 2019.
- Second order optimality on orthogonal stiefel manifolds. Bulletin des Sciences Mathématiques, 161:102868, 2020. ISSN 0007-4497. doi: https://doi.org/10.1016/j.bulsci.2020.102868. URL https://www.sciencedirect.com/science/article/pii/S0007449720300385.
- Nicolas Boumal. An introduction to optimization on smooth manifolds. Cambridge University Press, 2023.
- Deep networks from the principle of rate reduction. arXiv preprint arXiv:2010.14765, 2020.
- Commutation matrices and commutation tensors. Linear and Multilinear Algebra, 68(9):1721–1742, 2020. doi: 10.1080/03081087.2018.1556242.
- An analysis of single-layer networks in unsupervised feature learning. In Geoffrey Gordon, David Dunson, and Miroslav DudÃk (eds.), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pp. 215–223, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR. URL https://proceedings.mlr.press/v15/coates11a.html.
- Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248–255. IEEE, 2009. URL https://ieeexplore.ieee.org/abstract/document/5206848/.
- Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4690–4699, 2019.
- Gradient descent finds global minima of deep neural networks. In International conference on machine learning, pp. 1675–1685. PMLR, 2019.
- The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998.
- Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training. Proceedings of the National Academy of Sciences, 118(43):e2103091118, 2021. doi: 10.1073/pnas.2103091118. URL https://www.pnas.org/doi/abs/10.1073/pnas.2103091118.
- A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM Journal on Optimization, 28(1):302–332, 2018.
- Implicit regularization of discrete gradient dynamics in linear neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/f39ae9ff3a81f499230c4126e01f421b-Paper.pdf.
- Deep Learning. MIT Press, Cambridge, MA, USA, 2016. http://www.deeplearningbook.org.
- Deep declarative networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 44(08):3988–4004, aug 2022a. ISSN 1939-3539. doi: 10.1109/TPAMI.2021.3059462.
- On differentiating parameterized argmin and argmax problems with application to bi-level optimization. arXiv preprint arXiv:1607.05447, 2016.
- Exploiting problem structure in deep declarative networks: Two case studies. arXiv preprint arXiv:2202.12404, 2022b.
- Dissecting supervised contrastive learning. In International Conference on Machine Learning, pp. 3821–3830. PMLR, 2021.
- Implicit bias of gradient descent on linear convolutional networks. Advances in neural information processing systems, 31, 2018.
- Neural collapse under MSE loss: Proximity to and dynamics on the central path. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=w1UbdvWH_R3.
- Deep Residual Learning for Image Recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’16, pp. 770–778. IEEE, June 2016. doi: 10.1109/CVPR.2016.90. URL http://ieeexplore.ieee.org/document/7780459.
- Fix your classifier: the marginal value of training the last weight layer. arXiv preprint arXiv:1801.04540, 2018.
- Inductive bias of multi-channel linear convolutional networks with bounded weight norm. In Conference on Learning Theory, pp. 2276–2325. PMLR, 2022.
- An unconstrained layer-peeled perspective on neural collapse. arXiv preprint arXiv:2110.02796, 2021.
- Gradient descent follows the regularization path for general losses. In Jacob Abernethy and Shivani Agarwal (eds.), Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pp. 2109–2136. PMLR, 09–12 Jul 2020. URL https://proceedings.mlr.press/v125/ji20a.html.
- Highly accurate protein structure prediction with alphafold. Nature, 596:583 – 589, 2021. URL https://api.semanticscholar.org/CorpusID:235959867.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), San Diega, CA, USA, 2015.
- Learning multiple layers of features from tiny images. 2009.
- Deep learning. nature, 521(7553):436–444, 2015.
- Mario Lezcano Casado. Trivializations for gradient-based optimization on manifolds. Advances in Neural Information Processing Systems, 32, 2019.
- Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. In International Conference on Machine Learning, pp. 3794–3803. PMLR, 2019.
- Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31, 2018.
- Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 212–220, 2017.
- Neural collapse under cross-entropy loss. Applied and Computational Harmonic Analysis, 59:224–241, 2022.
- Gradient descent maximizes the margin of homogeneous neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SJeLIgBKPS.
- Jan R. Magnus and H. Neudecker. The elimination matrix: Some lemmas and applications. SIAM Journal on Algebraic Discrete Methods, 1(4):422–449, 1980. doi: 10.1137/0601049.
- Matrix differential calculus with applications in statistics and econometrics / Jan Rudolph Magnus and Heinz Neudecker. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ, third edition. edition, 2019. ISBN 1-119-54121-2.
- Francesco Mezzadri. How to generate random matrices from the classical compact groups. arXiv preprint math-ph/0609050, 2006.
- Neural collapse with unconstrained features. CoRR, abs/2011.11619, 2020. URL https://arxiv.org/abs/2011.11619.
- In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614, 2014.
- Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Science, 117(40):24652–24663, October 2020. doi: 10.1073/pnas.2015509117.
- Feature directions matter: Long-tailed learning via rotated balanced representation. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 27542–27563. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/peifeng23a.html.
- Regular polytope networks. IEEE Transactions on Neural Networks and Learning Systems, 33(9):4373–4387, 2021.
- L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507, 2017.
- Ohad Shamir. Gradient methods never overfit on separable data. Journal of Machine Learning Research, 22(85):1–20, 2021.
- Jonathan W. Siegel. Accelerated optimization with orthogonality constraints. Journal of Computational Mathematics, 39(2):207–226, 2020. ISSN 1991-7139. doi: https://doi.org/10.4208/jcm.1911-m2018-0242. URL http://global-sci.org/intro/article_detail/jcm/18372.html.
- Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. URL http://arxiv.org/abs/1409.1556.
- The implicit bias of gradient descent on separable data. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=r1q7n9gAb.
- A prototype-oriented framework for unsupervised domain adaptation. Advances in Neural Information Processing Systems, 34:17194–17208, 2021.
- Imbalance trouble: Revisiting neural-collapse geometry. Advances in Neural Information Processing Systems, 35:27225–27238, 2022.
- PyManopt: a Python toolbox for optimization on manifolds using automatic differentiation. Journal of Machine Learning Research, 17(137):1–5, 2016. URL https://www.pymanopt.org.
- Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5265–5274, 2018.
- Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International conference on machine learning, pp. 9929–9939. PMLR, 2020.
- E Weinan and Stephan Wojtowytsch. On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers. In Mathematical and Scientific Machine Learning, pp. 270–290. PMLR, 2022.
- A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142(1):397–434, 2013.
- Solving optimization problems over the stiefel manifold by smooth exact penalty function. arXiv preprint arXiv:2110.08986, 2021.
- Dissolving constraints for riemannian optimization. Mathematics of Operations Research, 49(1):366–397, 2024.
- Inducing neural collapse in imbalanced learning: Do we really need a learnable classifier at the end of deep neural network? In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=A6EmxI3_Xc.
- Neural collapse terminus: A unified solution for class incremental learning and its variants. arXiv pre-print, 2023.
- Neural collapse with normalized features: A geometric analysis over the riemannian manifold. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 11547–11560. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/4b3cc0d1c897ebcf71aca92a4a26ac83-Paper-Conference.pdf.
- Robust recovery via implicit bias of discrepant learning rates for double over-parameterization. Advances in Neural Information Processing Systems, 33:17733–17744, 2020.
- Learning diverse and discriminative representations via the principle of maximal coding rate reduction. Advances in Neural Information Processing Systems, 33:9422–9434, 2020.
- Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy8gdB9xx.
- First-order methods for geodesically convex optimization. In Conference on learning theory, pp. 1617–1638. PMLR, 2016.
- On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features. In International Conference on Machine Learning, pp. 27179–27202. PMLR, 2022a.
- Are all losses created equal: A neural collapse perspective. Advances in Neural Information Processing Systems, 35:31697–31710, 2022b.
- A geometric analysis of neural collapse with unconstrained features. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=KRODJAa6pzE.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.