Attention is not not Explanation (1908.04626v2)

Published 13 Aug 2019 in cs.CL

Abstract: Attention mechanisms play a central role in NLP systems, especially within recurrent neural network (RNN) models. Recently, there has been increasing interest in whether or not the intermediate representations offered by these modules may be used to explain the reasoning for a model's prediction, and consequently reach insights regarding the model's decision-making process. A paper claims that `Attention is not Explanation' (Jain and Wallace, 2019). We challenge many of the assumptions underlying this work, arguing that such a claim depends on one's definition of explanation, and that testing it needs to take into account all elements of the model, using a rigorous experimental design. We propose four alternative tests to determine when/whether attention can be used as explanation: a simple uniform-weights baseline; a variance calibration based on multiple random seed runs; a diagnostic framework using frozen weights from pretrained models; and an end-to-end adversarial attention training protocol. Each allows for meaningful interpretation of attention mechanisms in RNN models. We show that even when reliable adversarial distributions can be found, they don't perform well on the simple diagnostic, indicating that prior work does not disprove the usefulness of attention mechanisms for explainability.

PDF Abstract

Attention is Not Not Explanation: An Analytical Perspective

Introduction

In the paper "Attention is not not Explanation," the authors Sarah Wiegreffe and Yuval Pinter address the controversial claim that attention mechanisms in Recurrent Neural Networks (RNNs) do not offer meaningful explanations for model predictions. This rebuttal focuses particularly on the paper "Attention is not Explanation" by Jain and Wallace (2019), critically examining its assumptions and experimental setups. Wiegreffe and Pinter propose alternative methods to evaluate the interpretability of attention mechanisms, providing a rigorous analysis that contributes to ongoing discussions in the field of NLP.

Background and Motivation

The core of the controversy lies in whether attention mechanisms can be used as explanations for model predictions. Jain and Wallace assert that if alternative attention distributions yield similar predictions, the original attention scores cannot be reliably used to explain the model's decision. This premise is based on the assumption that explainability should be consistent and exclusive with respect to other feature-importance measures. However, Wiegreffe and Pinter argue that the definition of explanation is more nuanced and context-dependent.

Methodological Contributions

The paper offers four alternative tests for evaluating attention as explanation:

Uniform Weights Baseline: A baseline where attention weights are frozen to a uniform distribution.
Variance Calibration with Random Seeds: Assessing the expected variance in attention weights by training multiple models with different random seeds.
Diagnostic Framework: Utilizing frozen weights from pretrained models in a non-contextual Multi-Layer Perceptron (MLP) architecture.
Adversarial Training Protocol: An end-to-end adversarial training protocol that modifies the loss function to consider the distance from the base model's attention scores.

Experimental Analysis and Results

Uniform Weights Baseline

The authors first test whether attention is necessary by comparing models with uniform attention weights against those with learned attention weights. They find that for datasets like AG News and 20 Newsgroups, the attention mechanism offers little to no improvement, indicating that these datasets are not suitable for testing the role of attention in explainability.

Variance with Random Seeds

To assess the normal variance in attention distributions, the authors train multiple models with different random seeds. They show that some datasets, like SST, exhibit robust attention distributions despite random variations, while others, like Diabetes, show significant variability. This highlights the need to consider background stochastic variation when evaluating adversarial results.

Diagnostic Framework

The authors introduce an MLP model guided by pre-trained attention distributions. The results show that attention scores from the original LSTM models are useful and consistent, as they improve the MLP's performance compared to a learned distribution. This supports the notion that attention mechanisms capture meaningful token importance which transcends specific model architectures.

Adversarial Training Protocol

The authors propose a coherent training protocol for adversarial attention distributions, considering both prediction similarity and attention score divergence. Their findings confirm that while adversarial distributions can be found, they perform poorly in guiding simpler models, indicating that trained attention mechanisms do capture essential information about the data.

Implications and Future Directions

The paper's findings have significant theoretical and practical implications for the use of attention mechanisms in NLP. The authors show that attention scores, despite their variability, provide useful insights into model behavior and token importance. These insights challenge the exclusivity requisite assumed by Jain and Wallace, suggesting that multiple valid explanations can coexist.

Future research directions include extending the analysis to other tasks and languages, incorporating human evaluations, and developing theoretical frameworks for estimating the usefulness of attention mechanisms based on dataset and model properties. Additionally, exploring the existence of multiple adversarial attention models can further elucidate the limits and potential of attention as an explanatory tool.

Conclusion

Wiegreffe and Pinter's paper offers a comprehensive critique and alternative evaluation methods for the role of attention in model explainability. By demonstrating that attention mechanisms can provide meaningful explanations under certain conditions, the authors contribute to a more nuanced understanding of explainability in NLP models. Their work serves as a valuable resource for researchers aiming to develop robust and interpretable AI systems.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Sarah Wiegreffe (20 papers)
Yuval Pinter (41 papers)

Citations (844)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/jxmnop/status/1928917733787230218