Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Published 27 Aug 2019 in cs.CL | (1908.10084v1)

Abstract: BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

Abstract PDF Upgrade to Chat

Citations (10,216)

View on Semantic Scholar

Summary

The paper introduces SBERT, a method that employs siamese and triplet networks to produce efficient, fixed-size sentence embeddings.
It leverages various pooling techniques, with mean-pooling often yielding the best results, and improves performance by up to 11.7 points over InferSent.
Evaluations demonstrate significant computational efficiency, reducing similarity search time from 65 hours to approximately 5 seconds while excelling in supervised and unsupervised tasks.

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

The paper "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks" by Nils Reimers and Iryna Gurevych addresses a critical challenge in NLP: the derivation of semantically meaningful sentence embeddings that are computationally efficient for tasks such as semantic similarity search, clustering, and information retrieval. By modifying the BERT architecture to use siamese and triplet networks, they propose Sentence-BERT (SBERT), a method that creates fixed-size sentence embeddings which can be easily compared using cosine-similarity.

Introduction

BERT and its variant RoBERTa have shown excellent performance on a range of NLP tasks, including sentence-pair regression tasks such as Semantic Textual Similarity (STS). These models, however, are computationally intensive as they require both sentences to be fed into the network simultaneously, making them impractical for large-scale tasks. The paper introduces SBERT to address these limitations by enabling single-pass computation of sentence embeddings.

Model and Methodology

SBERT modifies the BERT network by introducing a pooling layer to derive specific, fixed-size sentence embeddings. These embeddings are fine-tuned via siamese and triplet network structures to ensure semantically similar sentences are close in the vector space. The primary methods experimented with include:

Classification Objective Function: Features concatenation of embeddings with trainable weights, optimized using cross-entropy loss.
Regression Objective Function: Utilizes cosine similarity between embeddings, optimized with mean squared error loss.
Triplet Objective Function: Utilizes triplet loss to ensure embeddings of similar sentences are closer.

The pooling layer experiments included CLS-token output, mean-pooling, and max-pooling, with mean-pooling generally yielding the best results.

Evaluation

The evaluation covered both unsupervised and supervised tasks across multiple datasets such as STS12-16, STS benchmark, SICK-R, and more. The results showed that SBERT significantly outperforms traditional BERT embeddings, as well as other sentence embedding methods like InferSent and Universal Sentence Encoder.

Unsupervised STS Tasks

SBERT demonstrated superior performance on several STS tasks, achieving improvements such as 11.7 points over InferSent and 5.5 points over Universal Sentence Encoder. Importantly, the computational efficiency of SBERT is dramatically enhanced, reducing the sentence similarity search time from 65 hours (BERT) to approximately 5 seconds.

Supervised STS with STS Benchmark

For supervised tasks, fine-tuning SBERT on the STS Benchmark data achieved comparable or better results than state-of-the-art methods, emphasizing the effectiveness of SBERT's architecture in various contexts.

Specialized Evaluations

SBERT was also evaluated on more specialized datasets like the Argument Facet Similarity (AFS) corpus and Wikipedia Sections Distinction dataset, further solidifying its standing as a robust and versatile method for generating high-quality sentence embeddings:

AFS Corpus: SBERT nearly matched BERT’s performance in 10-fold cross-validation but showed a performance drop in cross-topic evaluation, highlighting the challenge for generalization across diverse topics.
Wikipedia Sections Distinction: SBERT achieved high accuracy, outperforming previous approaches and demonstrating its capacity for fine-grained semantic understanding.

SentEval Toolkit

Using the SentEval toolkit, SBERT also excelled on various sentiment and classification tasks, surpassing InferSent and Universal Sentence Encoder in most tasks. This demonstrates SBERT's versatility beyond mere semantic similarity tasks.

Computational Efficiency

Efficiency tests confirmed that SBERT is notably faster than other methods like InferSent and Universal Sentence Encoder, especially when leveraging GPUs and optimized batching strategies.

Conclusion

SBERT represents a significant step in sentence embedding methods by combining the strengths of BERT with the computational efficiency of siamese and triplet network structures. SBERT delivers high-quality embeddings that can be efficiently computed and effectively applied across a broad range of NLP tasks. The work underscores not only the practical implications for large-scale semantic tasks but also the ongoing evolution of efficient NLP models.

In conclusion, the SBERT model demonstrates substantial improvements in both the performance and scalability of sentence embeddings, making it a valuable tool for researchers and practitioners alike. Future directions could involve further refinements and adaptations of SBERT to address broader sets of NLP challenges and datasets.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Authors (2)

Collections

Tweets

YouTube

Show All Videos

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Summary

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Introduction

Model and Methodology

Evaluation

Unsupervised STS Tasks

Supervised STS with STS Benchmark

Specialized Evaluations

SentEval Toolkit

Computational Efficiency

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (2)

Collections

Tweets

YouTube