Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 85 tok/s

Gemini 2.5 Pro 36 tok/s Pro

GPT-5 Medium 19 tok/s Pro

GPT-5 High 20 tok/s Pro

GPT-4o 72 tok/s Pro

Kimi K2 170 tok/s Pro

GPT OSS 120B 457 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

A Structured Self-attentive Sentence Embedding (1703.03130v1)

Published 9 Mar 2017 in cs.CL, cs.AI, cs.LG, and cs.NE

Abstract: This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.

Citations (2,078)

View on Semantic Scholar

Summary

The paper presents a self-attentive mechanism that forms a 2-D matrix of sentence embeddings, capturing diverse semantic components.
It integrates a bidirectional LSTM with self-attention, outperforming baselines in author profiling, sentiment analysis, and textual entailment.
A penalization term enforces diversity among weight vectors, enhancing interpretability by highlighting key sentence parts.

A Structured Self-attentive Sentence Embedding

Introduction

The paper "A Structured Self-attentive Sentence Embedding" by Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio presents a novel model for generating interpretable sentence embeddings using a self-attention mechanism. The primary contribution of the work is the representation of sentence embeddings as a 2-D matrix, where each row of the matrix attends to different parts of the sentence. This model is evaluated on three distinct tasks: author profiling, sentiment classification, and textual entailment, demonstrating significant performance improvements over previous sentence embedding methods.

Proposed Model

The core of the proposed model is a self-attentive mechanism built on top of a bidirectional LSTM (BiLSTM). The model comprises two main components:

Bidirectional LSTM: This component processes the input sentence to capture dependencies between adjacent words from both directions.
Self-attention Mechanism: After the BiLSTM provides hidden states for each word, the self-attention mechanism generates a set of summation weight vectors. These vectors are used to compute weighted sums of the BiLSTM hidden states, resulting in multiple vector representations that collectively form a matrix embedding for the sentence.

The attention mechanism ensures that each vector representation within the matrix can focus on different aspects of the sentence. The resulting matrix embedding provides a rich and diverse representation by capturing various semantic components.

Regularization

To address potential redundancy in the generated embeddings, the authors introduce a penalization term that encourages diversity across the summation weight vectors. This term relies on the Frobenius norm of the difference between the dot product of the annotation matrix and an identity matrix, thus pushing the summation weights to be distinct from one another.

Visualization

A notable feature of this approach is the ease of interpreting the extracted embeddings. By visualizing the annotation matrix, one can observe which specific sentence parts are captured by each row of the matrix embedding. This enhances the transparency and interpretability of the model's decisions.

Experimental Evaluation

The model was evaluated on three tasks using three datasets:

Author Profiling (Age Dataset): This task involved predicting the age range of a Twitter user based on their English tweets. The proposed model achieved an accuracy of 80.45%, outperforming baselines like BiLSTM with max pooling (77.40%) and CNN with max pooling (78.15%).
Sentiment Analysis (Yelp Dataset): For this five-class sentiment classification task, the model showed an accuracy of 64.21%, exceeding the performance of BiLSTM (61.99%) and CNN (62.05%).
Textual Entailment (SNLI Dataset): This involved determining the logical relationship between pairs of sentences. The model achieved 84.4% accuracy, close to the state-of-the-art result of 84.6% achieved by the NSE encoders, and outperforming multiple other strong baselines.

Exploratory Experiments

The paper also discussed several exploratory experiments to investigate different components of the model:

Effect of Penalization Term: Introducing the penalization term encouraged diversity among the weight vectors, improving the model's performance on the Age and Yelp datasets.
Effect of Multiple Vectors: Varying the number of rows in the matrix embedding (parameter r) consistently showed that having multiple rows significantly improves performance compared to a single vector representation.

Implications and Future Directions

The structured self-attentive sentence embedding model provides a versatile and interpretable method for sentence representation. Practical implications include better performance on various NLP tasks and the added benefit of interpretable embeddings. The theoretical implications suggest advancements in attention mechanisms and their integration with LSTM architectures.

Future work could explore extensions to unsupervised learning settings, improving the model's capability to handle longer sequences like paragraphs or documents, and experimenting with more complex attention mechanisms beyond weighted summations.

Conclusion

The paper successfully introduces a novel self-attentive mechanism for sentence embedding that significantly improves performance on multiple NLP tasks while providing interpretable representations. This work exemplifies the potential of attention mechanisms in enhancing both the performance and interpretability of sentence representations.