Towards Universal Paraphrastic Sentence Embeddings (1511.08198v3)

Published 25 Nov 2015 in cs.CL and cs.LG

Abstract: We consider the problem of learning general-purpose, paraphrastic sentence embeddings based on supervision from the Paraphrase Database (Ganitkevitch et al., 2013). We compare six compositional architectures, evaluating them on annotated textual similarity datasets drawn both from the same distribution as the training data and from a wide range of other domains. We find that the most complex architectures, such as long short-term memory (LSTM) recurrent neural networks, perform best on the in-domain data. However, in out-of-domain scenarios, simple architectures such as word averaging vastly outperform LSTMs. Our simplest averaging model is even competitive with systems tuned for the particular tasks while also being extremely efficient and easy to use. In order to better understand how these architectures compare, we conduct further experiments on three supervised NLP tasks: sentence similarity, entailment, and sentiment classification. We again find that the word averaging models perform well for sentence similarity and entailment, outperforming LSTMs. However, on sentiment classification, we find that the LSTM performs very strongly-even recording new state-of-the-art performance on the Stanford Sentiment Treebank. We then demonstrate how to combine our pretrained sentence embeddings with these supervised tasks, using them both as a prior and as a black box feature extractor. This leads to performance rivaling the state of the art on the SICK similarity and entailment tasks. We release all of our resources to the research community with the hope that they can serve as the new baseline for further work on universal sentence embeddings.

PDF Abstract

Universal Paraphrastic Sentence Embeddings: An Evaluation of Compositional Architectures

This paper addresses the challenge of developing general-purpose sentence embeddings that can effectively capture paraphrastic meanings and transfer across diverse domains as seamlessly as word embeddings have. Based on supervision from the Paraphrase Database (PPDB), the research evaluates six neural network-based compositional architectures, including word averaging, deep averaging networks (DANs), and LSTM architectures among others.

Key Findings

The evaluation utilized data from a variety of domains, drawing textual similarity datasets from sources like news, tweets, and image captions to assess the robustness of each architecture. Notably, while complex models such as LSTMs displayed strong performance on in-domain tests, simpler architectures, particularly word averaging, outperformed across diverse, out-of-domain conditions. Remarkably, these simpler models not only provided competitive results with specialized systems for individual tasks but did so with greater efficiency.

Additional supervised NLP tasks, i.e., sentence similarity, entailment, and sentiment classification, further underscored the robustness of the word averaging approach. While this model excelled in sentence similarity and entailment tasks, LSTM-based models recorded impressive performance in sentiment classification, achieving a new state-of-the-art on the Stanford Sentiment Treebank.

Methodology

Training leveraged the inherent structure of PPDB, utilizing a margin-based loss function to optimize sentence representation learning. Importantly, the paper highlights the role of pretrained word embeddings, noting significant improvement when embeddings are optimized in the context of sentence-level tasks.

Interestingly, a distinct pattern emerged where sentence similarity tasks benefited from using the learned sentence embeddings either as a prior or as a feature extractor, leading to state-of-the-art comparable results on established benchmarks.

Implications and Future Directions

The findings have immediate implications for NLP applications that require robust sentence-level representations. The efficiency and competitiveness of word averaging offer a simplified baseline that researchers can extend upon. The authors have made their resources publicly available, positioning these embeddings as a new benchmark for further research.

Future developments could explore hybrid models, integrating the strengths of simpler baselines with complex models to handle diverse NLP tasks more effectively. Additionally, addressing the under-training phenomenon in embeddings for infrequent words could lead to optimized performance across additional semantic tasks.

The research provides a compelling case for revisiting assumptions about model complexity in sentence embeddings, illustrating that simplicity does not preclude efficacy. This paper not only enriches the discourse on universal sentence embeddings but also sets a new baseline for future exploratory efforts in the domain.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

John Wieting (40 papers)
Mohit Bansal (304 papers)
Kevin Gimpel (72 papers)
Karen Livescu (89 papers)

Citations (549)

View on Semantic Scholar

Towards Universal Paraphrastic Sentence Embeddings (1511.08198v3)

Universal Paraphrastic Sentence Embeddings: An Evaluation of Compositional Architectures

Key Findings

Methodology

Implications and Future Directions

Related Papers