The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? (2010.05607v1)

Published 12 Oct 2020 in cs.CL

Abstract: There is a recent surge of interest in using attention as explanation of model predictions, with mixed evidence on whether attention can be used as such. While attention conveniently gives us one weight per input token and is easily extracted, it is often unclear toward what goal it is used as explanation. We find that often that goal, whether explicitly stated or not, is to find out what input tokens are the most relevant to a prediction, and that the implied user for the explanation is a model developer. For this goal and user, we argue that input saliency methods are better suited, and that there are no compelling reasons to use attention, despite the coincidence that it provides a weight for each input. With this position paper, we hope to shift some of the recent focus on attention to saliency methods, and for authors to clearly state the goal and user for their explanations.

PDF Abstract

On the Suitability of Attention and Saliency Methods for Model Explanation

The paper by Jasmijn Bastings and Katja Filippova titled "The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?" addresses the ongoing debate over the use of attention mechanisms as explanatory tools within neural network models, particularly in the domain of NLP. The authors challenge the recent inclination towards utilizing attention weights for interpretability and advocate for a more considered focus on saliency methods.

Core Argument and Objective

The primary argument put forth by the authors is centered around the purpose of model explanations, specifically for determining which input features are pivotal to a model's prediction. The manuscript suggests that while attention mechanisms, such as those introduced by Bahdanau et al. (2015), provide weights for input tokens, they do not comprehensively explicate the decision-making process of models in a faithful manner, particularly when the intended user is a model developer concerned with the fidelity of the explanations. Instead, saliency methods, which are designed to attribute relevance to input features concerning the output, are deemed more suitable for this task.

Examination of Attention

The discourse on attention as an explanation pivots around its ability to articulate the influence of each input token on a model's prediction. Papers like those by Jain and Wallace (2019), Serrano and Smith (2019), and Wiegreffe and Pinter (2019) delve into this by evaluating the faithfulness of attention weights as indicators of feature importance. These studies reveal that attention-based explanations often diverge from gradient-based importance measures and can be altered without affecting model predictions, suggesting their limitation in capturing causal relationships between inputs and outputs.

Saliency Methods

Bastings and Filippova review various saliency methods, such as gradient-based methods, Layer-Wise Relevance Propagation (LRP), and occlusion-based techniques. Each method is discussed for its potential to provide a more direct measure of input relevance:

Gradient-Based Methods: These methods utilize derivatives to determine the sensitivity of the model's output to changes in input features. Notably, Integrated Gradients address the saturation problem that can arise with vanilla gradients.
Layer-Wise Relevance Propagation: This approach redistributes relevance scores layer-by-layer from the output back to the input, offering explainability by highlighting the contribution of each input across the network.
Occlusion-Based Techniques: These involve systematically occluding parts of the input to observe changes in the output, thereby gauging feature importance based on resulting variability in model predictions.

Comparative Analysis

The authors make a compelling case for the supremacy of saliency methods over attention for explanation purposes. Saliency methods inherently aim to provide transparent accounts of how input features affect predictions, taking into account the entire network's computation path. In contrast, attention weights represent a snapshot that doesn't necessarily reflect the cumulative reasoning process of the entire model architecture.

Implications and Future Directions

The implications of moving the focus away from attention to saliency methods are substantial for developing more interpretable AI systems. By emphasizing model faithfulness, the research community can progress towards designing systems that provide more accurate and informative explanations. This shift could also pave the way for innovative techniques that integrate multiple dimensions of interpretability, potentially enhancing model transparency and trustworthiness across various applications.

Conclusion

In summary, Bastings and Filippova advocate for a recalibration of the focus in interpretability research. While they acknowledge the utility of attention mechanisms in specific contexts, their examination reaffirms that saliency methods are more aligned with the objectives of transparency and faithfulness required by model developers. This paper calls for a nuanced appreciation of interpretability objectives and highlights the need for clearly articulated goals in future research endeavors within AI.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Jasmijn Bastings (19 papers)
Katja Filippova (13 papers)

Citations (162)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos