Sanity Checks for Saliency Maps (1810.03292v3)

Published 8 Oct 2018 in cs.CV, cs.LG, and stat.ML

Abstract: Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual assessment can be misleading. Through extensive experiments we show that some existing saliency methods are independent both of the model and of the data generating process. Consequently, methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model, such as, finding outliers in the data, explaining the relationship between inputs and outputs that the model learned, and debugging the model. We interpret our findings through an analogy with edge detection in images, a technique that requires neither training data nor model. Theory in the case of a linear model and a single-layer convolutional neural network supports our experimental findings.

Authors (6)

Julius Adebayo (13 papers)
Justin Gilmer (39 papers)
Michael Muelly (4 papers)
Ian Goodfellow (54 papers)
Moritz Hardt (79 papers)
Been Kim (54 papers)

Citations (1,802)

View on Semantic Scholar

Summary

Sanity Checks for Saliency Maps: A Critical Assessment

"Sanity Checks for Saliency Maps" by Julius Adebayo et al. introduces a rigorous methodology to evaluate the faithfulness of saliency methods used as interpretability tools in machine learning models. As interpretability becomes increasingly critical in applications involving deep neural networks (DNNs), particularly those in image classification, the paper addresses the fundamental issue of assessing the quality and scope of explanation methods.

The standard approach in saliency methods involves highlighting relevant features in an image as identified by a model, but Adebayo et al. caution that reliance on visual appeal alone can be deceptive. The paper proposes an evaluation methodology grounded in statistical randomization tests, specifically focusing on model parameter randomization and data randomization.

Key Methodologies

The proposed methodology is centered around two novel randomization tests:

Model Parameter Randomization Test:
- This test determines whether the saliency map is influenced by the learned parameters of the model.
- It involves comparing the saliency map generated by a trained model against maps generated by variants of the model where some layer parameters have been re-initialized randomly.
- The tests reveal that methods like Guided Backpropagation show visual similarity to the original input but remain invariant to higher layer parameter changes, thus failing to reflect the model specifics.
Data Randomization Test:
- This test evaluates whether a saliency map reflects the relationship between data and labels.
- It involves training a model on a dataset where the labels have been randomly permuted and comparing the derived saliency maps.
- Saliency methods like Integrated Gradients and Gradient $\odot$ Input maintained the structure of the input image even when trained on random labels, thereby exhibiting an alarming degree of insensitivity to the actual label-specific features.

Major Findings

Through extensive experimentation across several datasets (ImageNet, MNIST, Fashion MNIST) and model architectures (Inception v3, MLP, CNN), the paper yields notable insights:

Sensitivity to Model Parameters:
- The saliency maps from methods such as gradient and GradCAM were significantly altered with model parameter changes, demonstrating their dependence on learned model parameters.
- Methods including Guided Backpropagation remained largely unchanged until the convolutional weights closest to the input were randomized, raising doubts about their utility in debugging or model introspection.
Sensitivity to Data Relationships:
- Saliency maps like those from Guided Backprop and Gradient $\odot$ Input still appeared visually coherent and meaningful even with models trained on completely randomized labels. This underscores the risk of mistaking visually plausible saliency maps for valid explanations.

Practical and Theoretical Implications

The paper's findings bear critical implications for the application of saliency methods in sensitive areas such as medical imaging or legal decision-making systems. Practitioners should be cautious about over-relying on visually appealing saliency maps which may not faithfully represent the underlying model or data relationships.

From a theoretical standpoint, the paper suggests that more robust methods need to be designed that inherently account for the model and data specifics, diminishing the chances of deriving misleading explanations. As the field moves forward, assessments like those proposed in this paper should become standard benchmarks to vet new interpretability techniques.

Future Directions

Moving forward, the paper invites further exploration into:

Additional types of randomization and invariance assessments beyond model parameters and label randomization.
Designing explanation methods that incorporate explicit dependencies on relevant data and model characteristics.
Developing quantitative benchmarks that align closely with the qualitative good practices in interpretability.

In summary, "Sanity Checks for Saliency Maps" significantly advices the community on rigorous and systematic evaluation techniques for saliency methods, paving the way for more reliable and transparent machine learning explanations.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos