Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching (2306.07960v2)

Published 13 Jun 2023 in cs.LG and stat.ML

Abstract: Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification. While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this symmetry breaks under class imbalances. This paper presents an intriguing discovery: the introduction of a ReLU activation at the final layer effectively restores the symmetry in SCL-learned representations. We arrive at this finding analytically, by establishing that the global minimizers of an unconstrained features model with SCL loss and entry-wise non-negativity constraints form an orthogonal frame. Extensive experiments conducted across various datasets, architectures, and imbalance scenarios corroborate our finding. Importantly, our experiments reveal that the inclusion of the ReLU activation restores symmetry without compromising test accuracy. This constitutes the first geometry characterization of SCL under imbalances. Additionally, our analysis and experiments underscore the pivotal role of batch selection strategies in representation geometry. By proving necessary and sufficient conditions for mini-batch choices that ensure invariant symmetric representations, we introduce batch-binding as an efficient strategy that guarantees these conditions hold.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. “On the Implicit Geometry of Cross-Entropy Parameterizations for Label-Imbalanced Data” In International Conference on Artificial Intelligence and Statistics, 2023, pp. 10815–10838 PMLR
  2. “A simple framework for contrastive learning of visual representations” In International conference on machine learning, 2020, pp. 1597–1607 PMLR
  3. “Mini-Batch Optimization of Contrastive Loss” In arXiv preprint arXiv:2307.05906, 2023
  4. “Neural Collapse in Deep Linear Network: From Balanced to Imbalanced Data” In arXiv preprint arXiv:2301.00437, 2023
  5. “Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training” In Proceedings of the National Academy of Sciences 118.43 National Acad Sciences, 2021
  6. “A Study of Neural Collapse Phenomenon: Grassmannian Frame, Symmetry, Generalization” In arXiv preprint arXiv:2304.08914, 2023
  7. “Dissecting supervised constrastive learning” In International Conference on Machine Learning, 2021, pp. 3821–3830 PMLR
  8. “Supervised contrastive learning for pre-trained language model fine-tuning” In arXiv preprint arXiv:2011.01403, 2020
  9. XY Han, Vardan Papyan and David L Donoho “Neural collapse under mse loss: Proximity to and dynamics on the central path” In arXiv preprint arXiv:2106.02073, 2021
  10. “An unconstrained layer-peeled perspective on neural collapse” In arXiv preprint arXiv:2110.02796, 2021
  11. “ELM: Embedding and Logit Margins for Long-Tail Learning” In arXiv preprint arXiv:2204.13208, 2022
  12. “Exploring balanced feature spaces for representation learning” In International Conference on Learning Representations, 2021
  13. “Supervised contrastive learning” In Advances in neural information processing systems 33, 2020, pp. 18661–18673
  14. “Inducing Neural Collapse to a Fixed Hierarchy-Aware Frame for Reducing Mistake Severity” In arXiv preprint arXiv:2303.05689, 2023
  15. “Targeted supervised contrastive learning for long-tailed recognition” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6918–6928
  16. “Inducing Neural Collapse in Deep Long-tailed Learning” In International Conference on Artificial Intelligence and Statistics, 2023, pp. 11534–11544 PMLR
  17. “Neural collapse with cross-entropy loss” In arXiv preprint arXiv:2012.08465, 2020
  18. Dustin G Mixon, Hans Parshall and Jianzong Pi “Neural collapse with unconstrained features” In arXiv preprint arXiv:2011.11619, 2020
  19. Vardan Papyan “The full spectrum of deep net hessians at scale: Dynamics with sample size” In arXiv preprint arXiv:1811.07062, 2018
  20. Vardan Papyan, XY Han and David L Donoho “Prevalence of neural collapse during the terminal phase of deep learning training” In Proceedings of the National Academy of Sciences 117.40 National Acad Sciences, 2020, pp. 24652–24663
  21. “Distributional robustness loss for long-tail learning” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9495–9504
  22. “Learning Prototype Classifiers for Long-Tailed Recognition” In arXiv preprint arXiv:2302.00491, 2023
  23. Peter Súkeník, Marco Mondelli and Christoph Lampert “Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model” In arXiv preprint arXiv:2305.13165, 2023
  24. Kihyuk Sohn “Improved deep metric learning with multi-class n-pair loss objective” In Advances in neural information processing systems 29, 2016
  25. “Extended unconstrained features model for exploring deep neural collapse” In arXiv preprint arXiv:2202.08087, 2022
  26. “Imbalance Trouble: Revisiting Neural-Collapse Geometry” In arXiv preprint arXiv:2208.05512, 2022
  27. Y Tian, D Krishnan and P Isola “Contrastive multiview coding. arXiv” In arXiv preprint arXiv:1906.05849, 2019
  28. Kilian Q Weinberger, John Blitzer and Lawrence Saul “Distance metric learning for large margin nearest neighbor classification” In Advances in neural information processing systems 18, 2005
  29. “Neural collapse inspired attraction-repulsion-balanced loss for imbalanced learning” In Neurocomputing Elsevier, 2023
  30. “Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?” In Advances in Neural Information Processing Systems 35, 2022, pp. 37991–38002
  31. “Neural collapse with normalized features: A geometric analysis over the riemannian manifold” In arXiv preprint arXiv:2209.09211, 2022
  32. “On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features” In arXiv preprint arXiv:2203.01238, 2022
  33. “Are All Losses Created Equal: A Neural Collapse Perspective” In arXiv preprint arXiv:2210.02192, 2022
  34. “A Geometric Analysis of Neural Collapse with Unconstrained Features” In Advances in Neural Information Processing Systems 34, 2021
  35. “Balanced contrastive learning for long-tailed visual recognition” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6908–6917

Summary

We haven't generated a summary for this paper yet.