Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations (2109.13059v4)

Published 27 Sep 2021 in cs.CL, cs.AI, and cs.LG

Abstract: In NLP, a large volume of tasks involve pairwise comparison between two sequences (e.g. sentence similarity and paraphrase identification). Predominantly, two formulations are used for sentence-pair tasks: bi-encoders and cross-encoders. Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient, however, they usually underperform cross-encoders. Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance but they require task fine-tuning and are computationally more expensive. In this paper, we present a completely unsupervised sentence representation model termed as Trans-Encoder that combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders. Specifically, on top of a pre-trained LLM (PLM), we start with converting it to an unsupervised bi-encoder, and then alternate between the bi- and cross-encoder task formulations. In each alternation, one task formulation will produce pseudo-labels which are used as learning signals for the other task formulation. We then propose an extension to conduct such self-distillation approach on multiple PLMs in parallel and use the average of their pseudo-labels for mutual-distillation. Trans-Encoder creates, to the best of our knowledge, the first completely unsupervised cross-encoder and also a state-of-the-art unsupervised bi-encoder for sentence similarity. Both the bi-encoder and cross-encoder formulations of Trans-Encoder outperform recently proposed state-of-the-art unsupervised sentence encoders such as Mirror-BERT and SimCSE by up to 5% on the sentence similarity benchmarks.

PDF Abstract

Trans-Encoder: Unsupervised Sentence-Pair Modelling through Self- and Mutual-Distillations

The paper "Trans-Encoder: Unsupervised Sentence-Pair Modelling through Self- and Mutual-Distillations" presents an innovative approach in the field of NLP that addresses the challenge of sentence-pair comparison. It introduces the Trans-Encoder framework, a methodology that synthesizes both bi-encoder and cross-encoder architectures into an iterative learning scheme.

Background

In the NLP landscape, sentence-pair tasks such as assessing semantic textual similarity, paraphrase identification, and natural language inference are fundamental. The prevalent architectures for these tasks are bi-encoders and cross-encoders. Bi-encoders compute fixed-dimensional sentence embeddings, allowing for efficient comparison but often at the cost of performance when compared to cross-encoders, which model inter-sentence interactions but suffer from computational inefficiencies and require task-specific fine-tuning.

Trans-Encoder Framework

The Trans-Encoder model seeks to leverage the strengths of both architectures by employing a self- and mutual-distillation methodology. This is achieved entirely in an unsupervised manner, starting with converting a pre-trained LLM (PLM) into an effective bi-encoder using contrastive learning techniques akin to SimCSE and Mirror-BERT. The core innovation lies in the iterative cycle of transferring knowledge between bi- and cross-encoders. Each encoder alternately acts as a teacher, generating pseudo-labels to refine the other. This reciprocal process capitalizes on the bi-encoder’s efficiency and the cross-encoder’s interaction modeling capabilities.

An additional advancement proposed is mutual-distillation across multiple PLMs, effectively using ensemble learning to average pseudo-labels and mitigate errors that may arise from intrinsic model biases.

Evaluation and Results

The empirical evaluation spans various tasks, including semantic textual similarity (STS) and binary classification tasks such as Quora Question Pair (QQP), QNLI, and MRPC. Trans-Encoder demonstrates superior performance, significantly enhancing the results over state-of-the-art unsupervised sentence encoders by up to 5% in specific benchmarks. These outcomes underscore the practical implications of Trans-Encoder in enhancing sentence-pair modeling performance without the need for labeled data.

Implications and Future Directions

Trans-Encoder contributes both theoretically and practically by demonstrating that unsupervised learning can achieve results comparable to supervised techniques through smart utilization of self- and mutual-distillation. This work suggests promising future paths in unsupervised neural network training, particularly in enhancing the generalization capability of cross-encoder models. The domain adaption observed in cross-encoder models across different tasks also points to potential applications in zero-shot learning contexts.

Looking forward, conceivable future developments could include exploring broader datasets for unsupervised training, further optimization of mutual-distillation methods, and potentially extending this framework to various domains beyond NLP. The findings here might inspire similar approaches in other fields where unsupervised learning hinges on efficiently leveraging architecture-specific strengths.

In conclusion, the Trans-Encoder framework signifies a meaningful advancement in unsupervised sentence-pair modeling, aligning with growing interests in bridging computational efficiency with high performance in NLP applications.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Fangyu Liu (59 papers)
Yunlong Jiao (8 papers)
Jordan Massiah (3 papers)
Emine Yilmaz (66 papers)
Serhii Havrylov (8 papers)

Citations (28)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - amzn/trans-encoder: Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations (132 stars)