Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features (2402.18884v1)
Abstract: Recent findings reveal that over-parameterized deep neural networks, trained beyond zero training-error, exhibit a distinctive structural pattern at the final layer, termed as Neural-collapse (NC). These results indicate that the final hidden-layer outputs in such networks display minimal within-class variations over the training set. While existing research extensively investigates this phenomenon under cross-entropy loss, there are fewer studies focusing on its contrastive counterpart, supervised contrastive (SC) loss. Through the lens of NC, this paper employs an analytical approach to study the solutions derived from optimizing the SC loss. We adopt the unconstrained features model (UFM) as a representative proxy for unveiling NC-related phenomena in sufficiently over-parameterized deep networks. We show that, despite the non-convexity of SC loss minimization, all local minima are global minima. Furthermore, the minimizer is unique (up to a rotation). We prove our results by formalizing a tight convex relaxation of the UFM. Finally, through this convex formulation, we delve deeper into characterizing the properties of global solutions under label-imbalanced training data.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- Y. Tian, D. Krishnan, and P. Isola, “Contrastive multiview coding,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, 2020, pp. 776–794.
- P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in neural information processing systems, vol. 33, pp. 18 661–18 673, 2020.
- V. Papyan, X. Han, and D. L. Donoho, “Prevalence of neural collapse during the terminal phase of deep learning training,” Proceedings of the National Academy of Sciences, vol. 117, no. 40, pp. 24 652–24 663, 2020.
- C. Fang, H. He, Q. Long, and W. J. Su, “Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training,” Proceedings of the National Academy of Sciences, vol. 118, no. 43, 2021.
- T. Galanti, A. György, and M. Hutter, “On the role of neural collapse in transfer learning,” arXiv preprint arXiv:2112.15121, 2021.
- F. Graf, C. Hofer, M. Niethammer, and R. Kwitt, “Dissecting supervised constrastive learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 3821–3830.
- X. Han, V. Papyan, and D. L. Donoho, “Neural collapse under mse loss: Proximity to and dynamics on the central path,” in International Conference on Learning Representations, 2021.
- L. Hui, M. Belkin, and P. Nakkiran, “Limitations of neural collapse for understanding generalization in deep learning,” arXiv preprint arXiv:2202.08384, 2022.
- W. Ji, Y. Lu, Y. Zhang, Z. Deng, and W. J. Su, “An unconstrained layer-peeled perspective on neural collapse,” arXiv preprint arXiv:2110.02796, 2021.
- J. Lu and S. Steinerberger, “Neural collapse with cross-entropy loss,” arXiv preprint arXiv:2012.08465, 2020.
- D. G. Mixon, H. Parshall, and J. Pi, “Neural collapse with unconstrained features,” arXiv preprint arXiv:2011.11619, 2020.
- T. Tirer and J. Bruna, “Extended unconstrained features model for exploring deep neural collapse,” arXiv preprint arXiv:2202.08087, 2022.
- L. Xie, Y. Yang, D. Cai, D. Tao, and X. He, “Neural collapse inspired attraction-repulsion-balanced loss for imbalanced learning,” arXiv preprint arXiv:2204.08735, 2022.
- Z. Zhu, T. Ding, J. Zhou, X. Li, C. You, J. Sulam, and Q. Qu, “A geometric analysis of neural collapse with unconstrained features,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- J. Zhou, X. Li, T. Ding, C. You, Q. Qu, and Z. Zhu, “On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features,” arXiv preprint arXiv:2203.01238, 2022.
- D. Soudry, E. Hoffer, M. S. Nacson, S. Gunasekar, and N. Srebro, “The implicit bias of gradient descent on separable data,” The Journal of Machine Learning Research, vol. 19, no. 1, pp. 2822–2878, 2018.
- Z. Ji and M. Telgarsky, “Risk and parameter convergence of logistic regression,” arXiv preprint arXiv:1803.07300, 2018.
- K. Lyu and J. Li, “Gradient descent maximizes the margin of homogeneous neural networks,” arXiv preprint arXiv:1906.05890, 2019.
- C. Thrampoulidis, G. R. Kini, V. Vakilian, and T. Behnia, “Imbalance trouble: Revisiting neural-collapse geometry,” arXiv preprint arXiv:2208.05512, 2022.
- T. Behnia, G. R. Kini, V. Vakilian, and C. Thrampoulidis, “On the implicit geometry of cross-entropy parameterizations for label-imbalanced data,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 10 815–10 838.
- G. R. Kini, V. Vakilian, T. Behnia, J. Gill, and C. Thrampoulidis, “Supervised-contrastive loss learns orthogonal frames and batching matters,” arXiv preprint arXiv:2306.07960, 2023.
- J. Zhou, C. You, X. Li, K. Liu, S. Liu, Q. Qu, and Z. Zhu, “Are all losses created equal: A neural collapse perspective,” arXiv preprint arXiv:2210.02192, 2022.
- T. Galanti, A. György, and M. Hutter, “Generalization bounds for transfer learning with pretrained classifiers,” arXiv preprint arXiv:2212.12532, 2022.
- W. Jitkrittum, A. K. Menon, A. S. Rawat, and S. Kumar, “Elm: Embedding and logit margins for long-tail learning,” arXiv preprint arXiv:2204.13208, 2022.
- D. Samuel and G. Chechik, “Distributional robustness loss for long-tail learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9495–9504.
- B. Gunel, J. Du, A. Conneau, and V. Stoyanov, “Supervised contrastive learning for pre-trained language model fine-tuning,” arXiv preprint arXiv:2011.01403, 2020.
- B. Kang, Y. Li, S. Xie, Z. Yuan, and J. Feng, “Exploring balanced feature spaces for representation learning,” in International Conference on Learning Representations, 2021.
- T. Li, P. Cao, Y. Yuan, L. Fan, Y. Yang, R. S. Feris, P. Indyk, and D. Katabi, “Targeted supervised contrastive learning for long-tailed recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6918–6928.
- T. Wang and P. Isola, “Understanding contrastive representation learning through alignment and uniformity on the hypersphere,” in International Conference on Machine Learning. PMLR, 2020, pp. 9929–9939.
- C. Yaras, P. Wang, Z. Zhu, L. Balzano, and Q. Qu, “Neural collapse with normalized features: A geometric analysis over the riemannian manifold,” arXiv preprint arXiv:2209.09211, 2022.
- D. P. Bertsekas, “Nonlinear programming,” 1999.