The (Un)reliability of saliency methods (1711.00867v1)

Published 2 Nov 2017 in stat.ML and cs.LG

Abstract: Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

PDF Abstract

The (Un)reliability of Saliency Methods: An Insightful Overview

Saliency methods are instrumental in deciphering the decision processes of deep neural networks, yet many of these methods present challenges due to their lack of reliability. This paper explores these challenges, particularly focusing on the concept of input invariance as a criterion to ensure reliable interpretations of saliency.

Key Insights

The paper introduces a fundamental issue with saliency methods: their susceptibility to host transformations that have no bearing on model predictions. This is probed using a basic input transformation—a constant shift—demonstrating that many saliency methods erroneously alter their attributions. The authors propose input invariance as a critical property for trustworthy saliency methods, suggesting that the method's response to input variations should mirror that of the model itself.

Methodological Examination

The paper meticulously explores a variety of saliency methods, grouped into gradients, signal, and attribution categories. Here are the major methods considered:

Gradients: Demonstrated reliability under constant shifts, attributed to complete dependence on model weights.
Signal Methods: Similarly resistant to these transformations, producing consistent attributions.
Attribution Methods: Including Integrated Gradients and Deep Taylor Decomposition, which show reliance on reference points that significantly affect invariance.

Experimental Framework and Results

The authors employ a neural network trained on the MNIST dataset to evaluate these methods. They compare two networks: one with standard inputs, and another with inputs shifted by a constant vector. Surprisingly, attribution methods like Integrated Gradients and Deep Taylor Decomposition, when equipped with inappropriate reference points, produce non-invariant outcomes, altering their attribution erroneously.

Moreover, SmoothGrad, a method for improving the visualization sharpness, is shown to inherit the invariance qualities of its underlying method. This underscores the significance of the foundational method's properties when using such augmentation techniques.

Theoretical and Practical Implications

The implications of this paper are manifold. The need for input invariance highlights the potential pitfalls in deploying saliency methods without critical evaluation of their invariance properties. This is particularly pertinent in high-stakes applications like healthcare, where misattributions could lead to costly consequences.

Future Directions

The realization that current approaches may not fully meet the input invariance criterion suggests a pathway for future research. This includes developing methods to ensure systematic normalization or other preprocessing steps that guarantee invariance across diverse input transformations. Furthermore, a deeper understanding and formalization of reference points could lead to saliency methods that inherently guarantee reliable attribution.

Conclusion

The paper marks an essential step towards more trustworthy interpretability in neural networks by identifying and addressing a specific source of unreliability in saliency methods. It advocates for further exploration into achieving invariance, both through methodological innovation and improved theoretical frameworks. This is crucial for deploying neural networks in critical fields where interpretability and reliability must be uncompromised.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Pieter-Jan Kindermans (19 papers)
Sara Hooker (71 papers)
Julius Adebayo (13 papers)
Maximilian Alber (13 papers)
Kristof T. Schütt (24 papers)
Sven Dähne (4 papers)
Dumitru Erhan (30 papers)
Been Kim (54 papers)

Citations (643)

View on Semantic Scholar