Explaining Recurrent Neural Network Predictions in Sentiment Analysis (1706.07206v2)

Published 22 Jun 2017 in cs.CL, cs.AI, cs.NE, and stat.ML

Abstract: Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for understanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task, and evaluate the resulting LRP relevances both qualitatively and quantitatively, obtaining better results than a gradient-based related method which was used in previous work.

Authors (4)

Leila Arras (9 papers)
Grégoire Montavon (50 papers)
Klaus-Robert Müller (167 papers)
Wojciech Samek (144 papers)

Citations (342)

View on Semantic Scholar

Summary

The paper extends the LRP method to recurrent neural networks, enabling a detailed, word-level breakdown of sentiment predictions.
It demonstrates that LRP offers clearer interpretability than sensitivity analysis by effectively distinguishing positive and negative contributions.
Empirical results on a bi-directional LSTM using the Stanford Sentiment Treebank highlight LRP’s practical benefits in model transparency.

Explaining Recurrent Neural Network Predictions in Sentiment Analysis

The paper "Explaining Recurrent Neural Network Predictions in Sentiment Analysis" by Arras, Montavon, Müller, and Samek explores the application of Layer-wise Relevance Propagation (LRP) to recurrent neural networks (RNNs), specifically aimed at enhancing interpretability in sentiment analysis tasks. This work extends LRP to address the intricacies of recurrent architectures, such as Long Short-Term Memory (LSTM) networks, which are well-suited for capturing long-range dependencies in text.

Core Contributions

The authors extend the LRP method—a previously established technique for elucidating the decision-making process of feed-forward neural networks—by designing propagation rules that handle the multiplicative interactions inherent in RNN architectures like LSTMs and Gated Recurrent Units (GRUs). By applying these adaptations, LRP can effectively decompose prediction scores into word-level relevances, thereby highlighting which words contribute most significantly to the sentiment classification task.

This technique was applied to a bi-directional LSTM model trained on a five-class sentiment prediction task using the Stanford Sentiment Treebank dataset. The performance of the LRP method was compared against traditional gradient-based sensitivity analysis (SA), and the results demonstrated superior interpretability from the LRP approach.

Methodological Insights

The paper describes two approaches to achieving input feature relevance: Sensitivity Analysis and Layer-wise Relevance Propagation:

Sensitivity Analysis (SA): Utilizes partial derivatives to estimate relevance scores using gradient backpropagation. While computationally accessible, SA lacks the nuanced ability to differentiate between features that actively contribute to or detract from the classification decision.
Layer-wise Relevance Propagation (LRP): This method redistributes the prediction score backward through the network layers, ensuring relevance conservation principles. The authors propose a variant for multiplicative interactions, marking it effective in discerning positive and negative contributions to sentiment classification.

Results and Empirical Findings

The paper presents both qualitative and quantitative evaluations. Qualitatively, heatmaps generated using LRP showed clear demarcation of words contributing to specific sentiment classes, a distinction less evident with SA. Quantitatively, the authors demonstrated through a word deletion experiment that LRP-based relevance allocations had a greater impact on classification accuracy than SA, affirming the pertinence of the LRP-derived insights.

Theoretical and Practical Implications

By extending LRP to RNNs, this research enhances the transparency and interpretability of complex neural network models used in natural language processing. Such explanations are critical for validating model decisions in sensitive applications and for uncovering latent biases within datasets. This work paves the way for more refined model interpretation methods that could be adapted to other domains reliant on sequential data processing, including speech recognition and time-series forecasting.

Future Directions

The paper suggests several avenues for further research, such as extending LRP techniques to character-level models and exploring applications beyond NLP, like time-series or biological data. Additionally, integrating these interpretability methods with model development cycles could ensure more robust and transparent AI systems. As AI models continue to evolve, techniques that provide deeper insight into model decision-making will be essential for fostering trust and ensuring ethical AI deployments.

In conclusion, the paper's advancement of LRP for recurrent architectures marks a pivotal step in providing a detailed understanding of neural network predictions, crucial for the development of transparent AI systems.

PDF Markdown