Papers
Topics
Authors
Recent
2000 character limit reached

Contrastive Learning in Distilled Models (2401.12472v1)

Published 23 Jan 2024 in cs.CL

Abstract: Natural Language Processing models like BERT can provide state-of-the-art word embeddings for downstream NLP tasks. However, these models yet to perform well on Semantic Textual Similarity, and may be too large to be deployed as lightweight edge applications. We seek to apply a suitable contrastive learning method based on the SimCSE paper, to a model architecture adapted from a knowledge distillation based model, DistilBERT, to address these two issues. Our final lightweight model DistilFace achieves an average of 72.1 in Spearman's correlation on STS tasks, a 34.2 percent improvement over BERT base.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Federated learning in edge computing: A systematic survey. Sensors, 22(2):450, Jan 2022. url: http://dx.doi.org/10.3390/s22020450.
  2. A simple framework for contrastive learning of visual representations, 2020, arXiv:2002.05709.
  3. A. Conneau and D. Kiela. Senteval: An evaluation toolkit for universal sentence representations, 2018, arXiv:1803.05449.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv:1810.04805.
  5. M. Farahmand. Pre-trained word embeddings or embedding layer?, 2019. url: link.
  6. Simcse: Simple contrastive learning of sentence embeddings, 2021, arXiv:2104.08821.
  7. Learning deep representations by mutual information estimation and maximization, 2018, arXiv:1808.06670.
  8. Robust semantic text similarity using lsa, machine learning, and linguistic resources. Lang Resources & Evaluation, 50, 2016. url: link.
  9. Z. Kira. Cs7643 lecture 12: Masked language models, 2022. url: Lecture 12.
  10. Z. Kira. Cs7643 lecture 18: Unsupervised and self-supervised learning, 2022. url: Lecture 18.
  11. On the sentence embeddings from pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9119–9130, Online, Nov. 2020. Association for Computational Linguistics. url: https://aclanthology.org/2020.emnlp-main.733.
  12. Probabilistic contrastive loss for self-supervised learning, 2021, arXiv:2112.01642.
  13. Roberta: A robustly optimized bert pretraining approach, 2019, arXiv:1907.11692.
  14. N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks, 2019, arXiv:1908.10084.
  15. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2019, arXiv:1910.01108.
  16. Contrastive distillation on intermediate representations for language model compression, 2020, arXiv:2009.14167.
  17. M. S. Tony Peng. The staggering cost of training sota ai models, 2019. url: link.
  18. H. Xiao. Why not the last hidden layer? why second-to-last?, 2018. url: Why not the last hidden layer.
  19. Consert: A contrastive framework for self-supervised sentence representation transfer, 2021, arXiv:2105.11741.
  20. Temperature as uncertainty in contrastive learning, 2021, arXiv:2110.04403.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.