2000 character limit reached
Contrastive Learning in Distilled Models (2401.12472v1)
Published 23 Jan 2024 in cs.CL
Abstract: Natural Language Processing models like BERT can provide state-of-the-art word embeddings for downstream NLP tasks. However, these models yet to perform well on Semantic Textual Similarity, and may be too large to be deployed as lightweight edge applications. We seek to apply a suitable contrastive learning method based on the SimCSE paper, to a model architecture adapted from a knowledge distillation based model, DistilBERT, to address these two issues. Our final lightweight model DistilFace achieves an average of 72.1 in Spearman's correlation on STS tasks, a 34.2 percent improvement over BERT base.
- Federated learning in edge computing: A systematic survey. Sensors, 22(2):450, Jan 2022. url: http://dx.doi.org/10.3390/s22020450.
- A simple framework for contrastive learning of visual representations, 2020, arXiv:2002.05709.
- A. Conneau and D. Kiela. Senteval: An evaluation toolkit for universal sentence representations, 2018, arXiv:1803.05449.
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv:1810.04805.
- M. Farahmand. Pre-trained word embeddings or embedding layer?, 2019. url: link.
- Simcse: Simple contrastive learning of sentence embeddings, 2021, arXiv:2104.08821.
- Learning deep representations by mutual information estimation and maximization, 2018, arXiv:1808.06670.
- Robust semantic text similarity using lsa, machine learning, and linguistic resources. Lang Resources & Evaluation, 50, 2016. url: link.
- Z. Kira. Cs7643 lecture 12: Masked language models, 2022. url: Lecture 12.
- Z. Kira. Cs7643 lecture 18: Unsupervised and self-supervised learning, 2022. url: Lecture 18.
- On the sentence embeddings from pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9119–9130, Online, Nov. 2020. Association for Computational Linguistics. url: https://aclanthology.org/2020.emnlp-main.733.
- Probabilistic contrastive loss for self-supervised learning, 2021, arXiv:2112.01642.
- Roberta: A robustly optimized bert pretraining approach, 2019, arXiv:1907.11692.
- N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks, 2019, arXiv:1908.10084.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2019, arXiv:1910.01108.
- Contrastive distillation on intermediate representations for language model compression, 2020, arXiv:2009.14167.
- M. S. Tony Peng. The staggering cost of training sota ai models, 2019. url: link.
- H. Xiao. Why not the last hidden layer? why second-to-last?, 2018. url: Why not the last hidden layer.
- Consert: A contrastive framework for self-supervised sentence representation transfer, 2021, arXiv:2105.11741.
- Temperature as uncertainty in contrastive learning, 2021, arXiv:2110.04403.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.