Supervised Learning of Universal Sentence Representations from Natural Language Inference Data (1705.02364v5)

Published 5 May 2017 in cs.CL

Abstract: Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Several attempts at learning unsupervised representations of sentences have not reached satisfactory enough performance to be widely adopted. In this paper, we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Much like how computer vision uses ImageNet to obtain features, which can then be transferred to other tasks, our work tends to indicate the suitability of natural language inference for transfer learning to other NLP tasks. Our encoder is publicly available.

PDF Abstract

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

The paper "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data" by Conneau et al. presents a systematic approach for generating universal sentence embeddings using supervised learning on the Stanford Natural Language Inference (SNLI) dataset. The authors introduce various neural architectures and demonstrate the effectiveness of their approach in the context of several transfer learning tasks.

Introduction

The challenge of deriving meaningful sentence-level representations has persisted despite the extensive utility and success of word embeddings like Word2Vec and GloVe. The authors investigate the potential of supervised learning from high-quality annotated data, specifically using the SNLI dataset, for producing universal sentence embeddings that generalize well across multiple NLP tasks.

Approach

Natural Language Inference Task

The SNLI corpus features 570k human-annotated English sentence pairs categorized into entailment, contradiction, and neutral. The NLI task requires deep semantic understanding, making it a robust candidate for training general-purpose sentence embeddings. The authors utilize a bi-directional Long Short-Term Memory (BiLSTM) architecture with max pooling to encode sentence pairs into fixed-size vectors and examine their transferability to other NLP tasks.

Neural Architectures

The authors evaluate a suite of sentence encoding architectures:

LSTM and GRU: Sequential models including concatenated bi-directional variants.
BiLSTM with Mean/Max Pooling: Aggregates information either by averaging or pooling the maximum value across hidden states.
Self-Attentive Networks: Incorporates multiple attention mechanisms for capturing salient parts of a sentence.
Hierarchical Convolutional Networks: Captures varying levels of sentence abstraction using multi-layer convolutions.

All models leverage pre-trained GloVe embeddings, and the resulting sentence vectors from these architectures are fine-tuned on the SNLI dataset.

Evaluation

The embeddings are evaluated on a diverse set of 12 transfer tasks, spanning sentiment analysis, subjectivity classification, paraphrase detection, and semantic textual similarity tasks. Noteworthy among these tasks are:

SICK-E and SICK-R: These tasks are similar to SNLI but derived from the SICK corpus.
STS14: Evaluates sentence similarity in an unsupervised manner.

Further, the effectiveness of these embeddings is tested in a practical retrieval task using the COCO dataset, targeting both image and caption retrieval.

Results

Performance on Transfer Tasks

The BiLSTM with max pooling stands out, exhibiting considerable enhancements in transfer performance compared to previous methods like SkipThought and FastSent:

It delivers improvements across multiple tasks such as sentiment analysis (CR, SST), binary classification (SUBJ, MPQA), and semantic relatedness (STS14).
The embeddings yield state-of-the-art performance on SICK-R with a Pearson correlation of 0.885 and SICK-E with an accuracy of 86.3%.

Comparison with Other Models

The BiLSTM-max model trained on SNLI consistently outperforms unsupervised approaches and models trained on other supervised tasks, demonstrating the efficacy of supervised learning with high-quality NLI data for generating robust universal sentence embeddings. Models trained with SNLI data showed superior semantic textual similarity and better performance on caption-related tasks in the COCO dataset compared to models trained on caption-specific data.

Implications and Future Work

This paper underscores the potential of using high-quality supervised datasets for training generalized NLP models. The success of the BiLSTM-max architecture suggests promising avenues for leveraging various NLI datasets or even combining multiple datasets like MultiNLI to improve the generality and robustness of sentence embeddings further. Future research could explore additional architectural variants and larger, more diverse datasets to refine and scale the embedding process.

Conclusion

In conclusion, the paper highlights that supervised learning on NLI tasks, specifically using BiLSTM with max pooling, generates universal sentence representations that outperform existing unsupervised methods in a variety of NLP tasks. This approach sets a new benchmark for developing generalized sentence embeddings, and the accessible code and models facilitate further exploration and application in diverse NLP contexts.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Alexis Conneau (33 papers)
Douwe Kiela (85 papers)
Holger Schwenk (35 papers)
Antoine Bordes (34 papers)
Loic Barrault (4 papers)

Citations (2,050)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos