Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching (2306.07960v2)
Abstract: Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification. While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this symmetry breaks under class imbalances. This paper presents an intriguing discovery: the introduction of a ReLU activation at the final layer effectively restores the symmetry in SCL-learned representations. We arrive at this finding analytically, by establishing that the global minimizers of an unconstrained features model with SCL loss and entry-wise non-negativity constraints form an orthogonal frame. Extensive experiments conducted across various datasets, architectures, and imbalance scenarios corroborate our finding. Importantly, our experiments reveal that the inclusion of the ReLU activation restores symmetry without compromising test accuracy. This constitutes the first geometry characterization of SCL under imbalances. Additionally, our analysis and experiments underscore the pivotal role of batch selection strategies in representation geometry. By proving necessary and sufficient conditions for mini-batch choices that ensure invariant symmetric representations, we introduce batch-binding as an efficient strategy that guarantees these conditions hold.
- “On the Implicit Geometry of Cross-Entropy Parameterizations for Label-Imbalanced Data” In International Conference on Artificial Intelligence and Statistics, 2023, pp. 10815–10838 PMLR
- “A simple framework for contrastive learning of visual representations” In International conference on machine learning, 2020, pp. 1597–1607 PMLR
- “Mini-Batch Optimization of Contrastive Loss” In arXiv preprint arXiv:2307.05906, 2023
- “Neural Collapse in Deep Linear Network: From Balanced to Imbalanced Data” In arXiv preprint arXiv:2301.00437, 2023
- “Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training” In Proceedings of the National Academy of Sciences 118.43 National Acad Sciences, 2021
- “A Study of Neural Collapse Phenomenon: Grassmannian Frame, Symmetry, Generalization” In arXiv preprint arXiv:2304.08914, 2023
- “Dissecting supervised constrastive learning” In International Conference on Machine Learning, 2021, pp. 3821–3830 PMLR
- “Supervised contrastive learning for pre-trained language model fine-tuning” In arXiv preprint arXiv:2011.01403, 2020
- XY Han, Vardan Papyan and David L Donoho “Neural collapse under mse loss: Proximity to and dynamics on the central path” In arXiv preprint arXiv:2106.02073, 2021
- “An unconstrained layer-peeled perspective on neural collapse” In arXiv preprint arXiv:2110.02796, 2021
- “ELM: Embedding and Logit Margins for Long-Tail Learning” In arXiv preprint arXiv:2204.13208, 2022
- “Exploring balanced feature spaces for representation learning” In International Conference on Learning Representations, 2021
- “Supervised contrastive learning” In Advances in neural information processing systems 33, 2020, pp. 18661–18673
- “Inducing Neural Collapse to a Fixed Hierarchy-Aware Frame for Reducing Mistake Severity” In arXiv preprint arXiv:2303.05689, 2023
- “Targeted supervised contrastive learning for long-tailed recognition” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6918–6928
- “Inducing Neural Collapse in Deep Long-tailed Learning” In International Conference on Artificial Intelligence and Statistics, 2023, pp. 11534–11544 PMLR
- “Neural collapse with cross-entropy loss” In arXiv preprint arXiv:2012.08465, 2020
- Dustin G Mixon, Hans Parshall and Jianzong Pi “Neural collapse with unconstrained features” In arXiv preprint arXiv:2011.11619, 2020
- Vardan Papyan “The full spectrum of deep net hessians at scale: Dynamics with sample size” In arXiv preprint arXiv:1811.07062, 2018
- Vardan Papyan, XY Han and David L Donoho “Prevalence of neural collapse during the terminal phase of deep learning training” In Proceedings of the National Academy of Sciences 117.40 National Acad Sciences, 2020, pp. 24652–24663
- “Distributional robustness loss for long-tail learning” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9495–9504
- “Learning Prototype Classifiers for Long-Tailed Recognition” In arXiv preprint arXiv:2302.00491, 2023
- Peter Súkeník, Marco Mondelli and Christoph Lampert “Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model” In arXiv preprint arXiv:2305.13165, 2023
- Kihyuk Sohn “Improved deep metric learning with multi-class n-pair loss objective” In Advances in neural information processing systems 29, 2016
- “Extended unconstrained features model for exploring deep neural collapse” In arXiv preprint arXiv:2202.08087, 2022
- “Imbalance Trouble: Revisiting Neural-Collapse Geometry” In arXiv preprint arXiv:2208.05512, 2022
- Y Tian, D Krishnan and P Isola “Contrastive multiview coding. arXiv” In arXiv preprint arXiv:1906.05849, 2019
- Kilian Q Weinberger, John Blitzer and Lawrence Saul “Distance metric learning for large margin nearest neighbor classification” In Advances in neural information processing systems 18, 2005
- “Neural collapse inspired attraction-repulsion-balanced loss for imbalanced learning” In Neurocomputing Elsevier, 2023
- “Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?” In Advances in Neural Information Processing Systems 35, 2022, pp. 37991–38002
- “Neural collapse with normalized features: A geometric analysis over the riemannian manifold” In arXiv preprint arXiv:2209.09211, 2022
- “On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features” In arXiv preprint arXiv:2203.01238, 2022
- “Are All Losses Created Equal: A Neural Collapse Perspective” In arXiv preprint arXiv:2210.02192, 2022
- “A Geometric Analysis of Neural Collapse with Unconstrained Features” In Advances in Neural Information Processing Systems 34, 2021
- “Balanced contrastive learning for long-tailed visual recognition” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6908–6917