Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models (2108.08877v3)

Published 19 Aug 2021 in cs.CL

Abstract: We provide the first exploration of sentence embeddings from text-to-text transformers (T5). Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as sequence-to-sequence mapping problems, it is unclear how to produce sentence embeddings from encoder-decoder models. We investigate three methods for extracting T5 sentence embeddings: two utilize only the T5 encoder and one uses the full T5 encoder-decoder model. To support our investigation, we establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark. Our encoder-only models outperforms Sentence-BERT and SimCSE sentence embeddings on both SentEval and SentGLUE transfer tasks, including semantic textual similarity (STS). Scaling up T5 from millions to billions of parameters is found to produce consistent further improvements. Finally, our encoder-decoder method achieves a new state-of-the-art on STS when using sentence embeddings. Our models are released at https://tfhub.dev/google/collections/sentence-t5/1.

PDF Abstract

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

This paper explores innovative strategies for deriving sentence embeddings using the Text-to-Text Transfer Transformer (T5) architecture, a notable advancement in NLP due to its adaptability in handling diverse text-to-text tasks. The authors introduce Sentence T5 (ST5), which extracts sentence embeddings from T5's encoder-decoder architecture. The paper evaluates three specific strategies: using the encoder's first token representation, averaging the encoder's output tokens, and leveraging the first token representation from the decoder.

The research highlights significant advancements in sentence embedding quality by employing large-scale pre-trained models, some reaching up to 11 billion parameters, effectively outperforming existing techniques such as Sentence-BERT (SBERT) and SimCSE. This research holds substantial implications for tasks requiring robust sentence semantics, such as semantic textual similarity (STS) and classification, without necessitating full cross-attention on each query-candidate pair, thus enhancing the efficiency of retrieval and clustering tasks.

Key contributions of this paper include:

The introduction and validation of three novel methodologies for generating sentence embeddings from T5, demonstrating that even without fine-tuning, encoder-only ST5 models excel on sentence transfer tasks, outperforming previously fine-tuned state-of-the-art models.
Establishing a new state-of-the-art in sentence embedding-based STS through the encoder-decoder approach.
Leveraging contrastive learning methods for further refining sentence encoder effectiveness, particularly with a two-stage fine-tuning process using ReQA and NLI datasets.
Development of SentGLUE as an extension of the SentEval toolkit, allowing a comprehensive comparison across challenging GLUE benchmark tasks, thereby providing a robust framework for evaluating transfer capabilities of sentence embeddings.

The methodological analysis reveals that larger ST5 models, when scaled appropriately, not only enhance transfer task performance but significantly mitigate the usual anisotropy issues observed in smaller models like BERT and RoBERTa. The research also asserts that hybrid encoder-decoder configurations of T5 yield superior results in semantic similarity evaluations without the need for auxiliary fine-tuning-specific tokens, establishing a refined approach to pooling strategies commonly employed in transformer-based models.

Experimentation results include competitive performance surpassing or equating to SBERT/SRoBERTa benchmarks across multiple transfer and semantic tasks. These findings strongly advocate for the continued expansion of model parameter scales, suggesting that future advancements in T5-like models could further bolster the performance of sentence embeddings, opening avenues for more nuanced representation learning capable of contextual adaptiveness beyond traditional task constraints.

From a practical standpoint, this approach enables the more efficient use of pre-trained models for computationally intensive applications such as semantic retrieval or clustering in large datasets, aligning with the industry's growing demand for scalable and efficient NLP solutions.

In conclusion, the work on Sentence T5 offers a promising direction for future AI research, predominantly in enhancing the scalability of transfer transformers and refining the methods for extracting meaningful sentence representations from pre-trained models. This can lead to further breakthroughs in diverse NLP applications, including but not limited to efficient large-scale semantic similarity assessments and improved adaptability of LLMs for specific language processing tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Jianmo Ni (31 papers)
Noah Constant (32 papers)
Ji Ma (72 papers)
Keith B. Hall (3 papers)
Daniel Cer (28 papers)
Yinfei Yang (73 papers)
Gustavo Hernández Ábrego (5 papers)

Citations (447)

View on Semantic Scholar

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models (2108.08877v3)

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

Related Papers