Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Contrastive Learning of Sentence Embeddings with Focal-InfoNCE (2310.06918v2)

Published 10 Oct 2023 in cs.CL and cs.LG

Abstract: The recent success of SimCSE has greatly advanced state-of-the-art sentence representations. However, the original formulation of SimCSE does not fully exploit the potential of hard negative samples in contrastive learning. This study introduces an unsupervised contrastive learning framework that combines SimCSE with hard negative mining, aiming to enhance the quality of sentence embeddings. The proposed focal-InfoNCE function introduces self-paced modulation terms in the contrastive objective, downweighting the loss associated with easy negatives and encouraging the model focusing on hard negatives. Experimentation on various STS benchmarks shows that our method improves sentence embeddings in terms of Spearman's correlation and representation alignment and uniformity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Semeval-2015 task 2: Semantic textual similarity, english, spanish and pilot on interpretability. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pages 252–263.
  2. Semeval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pages 81–91.
  3. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In SemEval-2016. 10th International Workshop on Semantic Evaluation; 2016 Jun 16-17; San Diego, CA. Stroudsburg (PA): ACL; 2016. p. 497-511. ACL (Association for Computational Linguistics).
  4. Semeval-2012 task 6: A pilot on semantic textual similarity. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 385–393.
  5. * sem 2013 shared task: Semantic textual similarity. In Second joint conference on lexical and computational semantics (* SEM), volume 1: proceedings of the Main conference and the shared task: semantic textual similarity, pages 32–43.
  6. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055.
  7. Universal sentence encoder for english. In Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, pages 169–174.
  8. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
  9. Diffcse: Difference-based contrastive learning for sentence embeddings. arXiv preprint arXiv:2204.10298.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  11. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
  12. Declutr: Deep contrastive learning for unsupervised textual representations. arXiv preprint arXiv:2006.03659.
  13. Improving adversarial robustness with self-paced hard-class pair reweighting. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence.
  14. Promptbert: Improving bert sentence embeddings with prompts. arXiv preprint arXiv:2201.04337.
  15. Supervised contrastive learning. Advances in Neural Information Processing Systems, 33:18661–18673.
  16. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988.
  17. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  18. A sick cure for the evaluation of compositional distributional semantic models. In Lrec, pages 216–223. Reykjavik.
  19. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4004–4012.
  20. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  21. Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592.
  22. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823.
  23. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6398–6407.
  24. Feng Wang and Huaping Liu. 2021. Understanding the behaviour of contrastive loss. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2495–2504.
  25. Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929–9939. PMLR.
  26. Smoothed contrastive learning for unsupervised sentence embedding. arXiv preprint arXiv:2109.04321.
  27. Esimcse: Enhanced sample building method for contrastive learning of unsupervised sentence embedding. arXiv preprint arXiv:2109.04380.
  28. Consert: A contrastive framework for self-supervised sentence representation transfer. arXiv preprint arXiv:2105.11741.
  29. Debiased contrastive learning of unsupervised sentence representations. arXiv preprint arXiv:2205.00656.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Pengyue Hou (3 papers)
  2. Xingyu Li (104 papers)
Citations (3)