An Unsupervised Sentence Embedding Method by Mutual Information Maximization (2009.12061v2)

Published 25 Sep 2020 in cs.CL and cs.LG

Abstract: BERT is inefficient for sentence-pair tasks such as clustering or semantic search as it needs to evaluate combinatorially many sentence pairs which is very time-consuming. Sentence BERT (SBERT) attempted to solve this challenge by learning semantically meaningful representations of single sentences, such that similarity comparison can be easily accessed. However, SBERT is trained on corpus with high-quality labeled sentence pairs, which limits its application to tasks where labeled data is extremely scarce. In this paper, we propose a lightweight extension on top of BERT and a novel self-supervised learning objective based on mutual information maximization strategies to derive meaningful sentence embeddings in an unsupervised manner. Unlike SBERT, our method is not restricted by the availability of labeled data, such that it can be applied on different domain-specific corpus. Experimental results show that the proposed method significantly outperforms other unsupervised sentence embedding baselines on common semantic textual similarity (STS) tasks and downstream supervised tasks. It also outperforms SBERT in a setting where in-domain labeled data is not available, and achieves performance competitive with supervised methods on various tasks.

PDF Abstract

Unsupervised Sentence Embedding Method by Mutual Information Maximization

The paper introduces an innovative method for unsupervised sentence embedding, evaluating its efficacy without relying on labeled data. The proposed method addresses the known inefficiency of BERT in sentence-pair regression tasks due to the necessity of computing numerous pairwise evaluations, a challenge previously approached by Sentence BERT (SBERT). SBERT required high-quality labeled sentence pairs, limiting its applicability in scenarios with scarce labeled data. The authors' solution uses a novel strategy based on mutual information (MI) maximization to derive meaningful sentence embeddings in an unsupervised manner, eliminating the dependency on labeled datasets.

Methodology

The primary contribution lies in extending BERT with a self-supervised learning objective that maximizes MI between global sentence embeddings and local context embeddings. The method incorporates convolutional neural networks (CNNs) with varying kernel sizes to transform BERT token embeddings into n-gram representations. This allows the model to encode richer semantic structures compared to simple average pooled token embeddings. By maximizing MI, the encoder is encouraged to capture shared global contextual aspects uniquely representative of each sentence, employing negative sampling for contrastive learning.

Experimental Results

The paper presents empirical evidence demonstrating the superiority of their method. The proposed method significantly surpasses other unsupervised sentence embedding models in semantic textual similarity (STS) benchmarks, achieving results competitive with supervised models in certain tasks:

Semantic Textual Similarity (STS): Across several STS evaluations, the model outperformed unsupervised baselines and rivaled supervised models like InferSent and USE in some settings. Notably, the model achieved an average Spearman rank correlation higher than unsupervised alternatives when applied without labeled data.
Argument Facet Similarity (AFS): Demonstrated its flexibility by outperforming both BERT and SBERT when applied to a domain-specific dataset without labeled data, reinforcing the benefit of domain-specific training.
Supervised Evaluations: Utilizing SentEval for a set of supervised tasks, the model presented competitive results with supervised systems, indicating its ability to generate effective sentence embeddings even without label dependency during pretraining.

Implications and Future Work

The implications of this research are twofold. Practically, it offers an approach to effectively manage sentence embedding tasks in domains where labeled data is sparse or expensive, thus broadening the applicability of transfer learning in NLP tasks. Theoretically, it fosters future exploration in representation learning, especially in exploiting mutual information maximization for more expressive embeddings. The authors propose an avenue for further research into semi-supervised methods, potentially enhancing transferability and performance across varied domains.

To conclude, the paper substantiates a significant advancement in unsupervised sentence representation learning, achieving notable results through a novel application of MI maximization. This approach not only challenges existing methodologies centered around supervised learning but offers a potent alternative for diverse and domain-specific applications in NLP.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Yan Zhang (954 papers)
Ruidan He (11 papers)
Zuozhu Liu (78 papers)
Kwan Hui Lim (39 papers)
Lidong Bing (144 papers)

Citations (165)

View on Semantic Scholar

An Unsupervised Sentence Embedding Method by Mutual Information Maximization (2009.12061v2)

Unsupervised Sentence Embedding Method by Mutual Information Maximization

Methodology

Experimental Results

Implications and Future Work

Related Papers