Whitening Sentence Representations for Better Semantics and Faster Retrieval (2103.15316v1)

Published 29 Mar 2021 in cs.CL, cs.AI, and cs.LG

Abstract: Pre-training models such as BERT have achieved great success in many natural language processing tasks. However, how to obtain better sentence representation through these pre-training models is still worthy to exploit. Previous work has shown that the anisotropy problem is an critical bottleneck for BERT-based sentence representation which hinders the model to fully utilize the underlying semantic features. Therefore, some attempts of boosting the isotropy of sentence distribution, such as flow-based model, have been applied to sentence representations and achieved some improvement. In this paper, we find that the whitening operation in traditional machine learning can similarly enhance the isotropy of sentence representations and achieve competitive results. Furthermore, the whitening technique is also capable of reducing the dimensionality of the sentence representation. Our experimental results show that it can not only achieve promising performance but also significantly reduce the storage cost and accelerate the model retrieval speed.

Authors (4)

Jianlin Su (31 papers)
Jiarun Cao (4 papers)
Weijie Liu (33 papers)
Yangyiwen Ou (1 paper)

Citations (274)

View on Semantic Scholar

Summary

The paper presents a whitening transformation that normalizes BERT embeddings to alleviate anisotropy and improve semantic similarity.
It applies PCA-like dimensionality reduction to enhance retrieval speed and cut computational costs while retaining robust performance.
Experimental evaluations on seven datasets show superior results over existing methods in both unsupervised and supervised settings.

Whitening Sentence Representations for Better Semantics and Faster Retrieval

The research paper "Whitening Sentence Representations for Better Semantics and Faster Retrieval" introduces an innovative approach aimed at improving sentence representations generated by BERT-based models through the application of whitening operations. This method is proposed to tackle the pervasive anisotropy issue in sentence embeddings and offers additional benefits of reducing dimensionality, improving semantic retrieval accuracy, and enhancing model performance across standard benchmarks.

Background and Motivation

Despite the notable success of deep neural LLMs like BERT in various NLP tasks, their sentence-level embeddings often suffer from anisotropy. Anisotropic embeddings confine representations to a narrow vector space, making semantic similarity tasks less effective. Previous works, such as BERT-flow, have employed flow-based normalizations to alleviate this issue. The authors of this paper explore a simpler yet effective post-processing method from traditional machine learning, known as whitening, to improve the isotropy of sentence embeddings.

Proposed Methodology: Whitening Transformation

The paper introduces a post-processing technique that applies a whitening transformation to sentence embeddings. Through this operation, the authors achieve two primary objectives:

Isotropic Transformation: By shifting the mean of sentence vectors to zero and transforming their covariance matrix into an identity matrix, the sentence representations become isotropic, thus enabling more accurate semantic similarity evaluations using cosine similarity.
Dimensionality Reduction: The whitening operation also naturally facilitates dimensionality reduction, which can lead to reduced storage costs and faster retrieval processes. This reduction is akin to performing Principal Component Analysis (PCA), where the dimensionality parameter 'k' is key to maintaining robust performance while optimizing memory usage and processing speed.

Experimental Evaluation

The effectiveness of the proposed approach is validated on seven standard semantic textual similarity datasets, with results exhibiting competitive improvements over BERT-flow and other baselines. Particularly under unsupervised settings, the whitening-enhanced BERT models showed superior performance on multiple datasets such as STS-B, STS-12, and STS-14. With supervised settings, the approach demonstrated enhancement in STS Benchmark results, achieving notable performances when dimensionality was optimally reduced.

Implications and Future Work

This research not only highlights a cost-effective approach to refining pre-trained sentence embeddings but also pushes the boundary of using machine learning techniques to address inherent limitations of current deep learning models. The proposed method's ability to optimize sentence representation within a lower dimensionality space has implications for real-time systems, where speed and efficiency are paramount.

Further developments may focus on refining the choice of dimensionality 'k' across different NLP tasks to tailor performance optimally. Additionally, exploring the integration of this method with other pre-trained models such as RoBERTa or GPT variants could extend the generalizability and applicability of the approach.

In conclusion, the whitening transformation emerges as a promising post-processing step in NLP pipelines, enhancing anisotropic sentence embeddings, boosting semantic matching tasks' performance, and simultaneously reducing computational burdens.

PDF Markdown

Related Papers

YouTube

Show All Videos