Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analysis of Using Sigmoid Loss for Contrastive Learning (2402.12613v1)

Published 20 Feb 2024 in cs.LG

Abstract: Contrastive learning has emerged as a prominent branch of self-supervised learning for several years. Especially, CLIP, which applies contrastive learning to large sets of captioned images, has garnered significant attention. Recently, SigLIP, a variant of CLIP, has been proposed, which uses the sigmoid loss instead of the standard InfoNCE loss. SigLIP achieves the performance comparable to CLIP in a more efficient manner by eliminating the need for a global view. However, theoretical understanding of using the sigmoid loss in contrastive learning is underexplored. In this paper, we provide a theoretical analysis of using the sigmoid loss in contrastive learning, in the perspective of the geometric structure of learned embeddings. First, we propose the double-Constant Embedding Model (CCEM), a framework for parameterizing various well-known embedding structures by a single variable. Interestingly, the proposed CCEM is proven to contain the optimal embedding with respect to the sigmoid loss. Second, we mathematically analyze the optimal embedding minimizing the sigmoid loss for contrastive learning. The optimal embedding ranges from simplex equiangular-tight-frame to antipodal structure, depending on the temperature parameter used in the sigmoid loss. Third, our experimental results on synthetic datasets coincide with the theoretical results on the optimal embedding structures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229.
  2. Do more negative samples necessarily hurt in contrastive learning? In International Conference on Machine Learning, pages 1101–1116. PMLR.
  3. There is no (75, 32, 10, 16) strongly regular graph. Linear Algebra and its Applications, 557:62–83.
  4. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924.
  5. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
  6. Big self-supervised models are strong semi-supervised learners. Advances in neural information processing systems, 33:22243–22255.
  7. Intriguing properties of contrastive losses. Advances in Neural Information Processing Systems, 34:11834–11845.
  8. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758.
  9. On certain properties of the regular n-simplex. International Journal of Mathematical Education in Science and Technology, 35(4):617–629.
  10. Clap learning audio concepts from natural language supervision. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
  11. Equiangular tight frames with centroidal symmetry. Applied and Computational Harmonic Analysis, 44(2):476–496.
  12. Tables of the existence of equiangular tight frames. arXiv preprint arXiv:1504.00253.
  13. Steiner equiangular tight frames. Linear algebra and its applications, 436(5):1014–1027.
  14. Gale, D. (1953). On inscribing n-dimensional sets in a regular n-simplex. Proceedings of the American Mathematical Society, 4(2):222–225.
  15. Declutr: Deep contrastive learning for unsupervised textual representations. arXiv preprint arXiv:2006.03659.
  16. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15180–15190.
  17. Cyclip: Cyclic contrastive language-image pretraining. Advances in Neural Information Processing Systems, 35:6704–6719.
  18. Scaling and benchmarking self-supervised visual representation learning. In Proceedings of the ieee/cvf International Conference on computer vision, pages 6391–6400.
  19. Dissecting supervised contrastive learning. In International Conference on Machine Learning, pages 3821–3830. PMLR.
  20. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284.
  21. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 297–304. JMLR Workshop and Conference Proceedings.
  22. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of machine learning research, 13(2).
  23. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738.
  24. Towards the generalization of contrastive self-supervised learning.
  25. Nonlinear ica using auxiliary variables and generalized contrastive learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 859–868. PMLR.
  26. A survey on contrastive self-supervised learning. Technologies, 9(1):2.
  27. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR.
  28. Temperature schedules for self-supervised contrastive methods on long-tail data. arXiv preprint arXiv:2303.13664.
  29. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. arXiv preprint arXiv:2110.05208.
  30. Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering, 35(1):857–876.
  31. Neural collapse under cross-entropy loss. Applied and Computational Harmonic Analysis, 59:224–241.
  32. Active contrastive learning of audio-visual video representations. arXiv preprint arXiv:2009.09805.
  33. Slip: Self-supervision meets language-image pre-training. In European Conference on Computer Vision, pages 529–544. Springer.
  34. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4004–4012.
  35. O’Neill, B. (2021). The double-constant matrix, centering matrix and equicorrelation matrix: Theory and applications. arXiv preprint arXiv:2109.05814.
  36. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
  37. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  38. Understanding contrastive learning requires incorporating inductive biases. In International Conference on Machine Learning, pages 19250–19286. PMLR.
  39. A theoretical analysis of contrastive unsupervised representation learning. In International Conference on Machine Learning, pages 5628–5637. PMLR.
  40. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823.
  41. Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems, 29.
  42. Mini-batch optimization of contrastive loss. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models.
  43. Grassmannian frames with applications to coding and communication. Applied and computational harmonic analysis, 14(3):257–275.
  44. On the existence of equiangular tight frames. Linear Algebra and its applications, 426(2-3):619–635.
  45. What makes for good views for contrastive learning? Advances in neural information processing systems, 33:6827–6839.
  46. Understanding the behaviour of contrastive loss. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2495–2504.
  47. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929–9939. PMLR.
  48. Clear: Contrastive learning for sentence representation. arXiv preprint arXiv:2012.15466.
  49. Filip: Fine-grained interactive language-image pre-training. arXiv preprint arXiv:2111.07783.
  50. Sigmoid loss for language image pre-training. arXiv preprint arXiv:2303.15343.
  51. Dual temperature helps contrastive learning without many negative samples: Towards understanding and simplifying moco. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14441–14450.
  52. How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning. arXiv preprint arXiv:2203.16262.
  53. Temperature as uncertainty in contrastive learning. arXiv preprint arXiv:2110.04403.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Chungpa Lee (4 papers)
  2. Joonhwan Chang (2 papers)
  3. Jy-yong Sohn (37 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com