On the Faithfulness of Vision Transformer Explanations
Abstract: To interpret Vision Transformers, post-hoc explanations assign salience scores to input pixels, providing human-understandable heatmaps. However, whether these interpretations reflect true rationales behind the model's output is still underexplored. To address this gap, we study the faithfulness criterion of explanations: the assigned salience scores should represent the influence of the corresponding input pixels on the model's predictions. To evaluate faithfulness, we introduce Salience-guided Faithfulness Coefficient (SaCo), a novel evaluation metric leveraging essential information of salience distribution. Specifically, we conduct pair-wise comparisons among distinct pixel groups and then aggregate the differences in their salience scores, resulting in a coefficient that indicates the explanation's degree of faithfulness. Our explorations reveal that current metrics struggle to differentiate between advanced explanation methods and Random Attribution, thereby failing to capture the faithfulness property. In contrast, our proposed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations. Furthermore, our SaCo demonstrates that the use of gradient and multi-layer aggregation can markedly enhance the faithfulness of attention-based explanation, shedding light on potential paths for advancing Vision Transformer explainability.
- Quantifying attention flow in transformers. In ACL, 2020.
- Sanity checks for saliency maps. In NeurIPS, 2018.
- Openxai: Towards a transparent evaluation of model explanations. In NeurIPS, 2022.
- Xai for transformers: Better explanations through conservative propagation. In ICML, 2022.
- Towards better understanding of gradient-based attribution methods for deep neural networks. In ICLR, 2018.
- A diagnostic study of explainability techniques for text classification. In EMNLP, 2020.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 2015.
- The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? In ACL, 2020.
- Layer-wise relevance propagation for neural networks with local renormalization layers. In ICANN, 2016.
- End-to-end object detection with transformers. In ECCV, 2020.
- Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In ICCV, 2021a.
- Transformer interpretability beyond attention visualization. In CVPR, 2021b.
- Generating hierarchical explanations on text classification via feature interaction detection. In ACL, 2020.
- What i cannot predict, i do not understand: A human-centered evaluation framework for explainability methods. In NeurIPS, 2022.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019.
- Eraser: A benchmark to evaluate rationalized nlp models. In ACL, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Understanding deep networks via extremal perturbations and smooth masks. In ICCV, 2019.
- Interpretable explanations of black boxes by meaningful perturbation. In ICCV, 2017.
- Visualization of supervised and self-supervised neural networks via attribution guided factorization. In AAAI, 2021.
- The out-of-distribution problem in explainability and search methods for feature importance explanations. In NeurIPS, 2021.
- A benchmark for interpretability methods in deep neural networks. In NeurIPS, 2019.
- Towards faithfully interpretable nlp systems: How should we define and evaluate faithfulness? In ACL, 2020.
- Attention is not explanation. In NAACL, 2019.
- Maurice G Kendall. A new measure of rank correlation. Biometrika, 1938.
- Attention is not only a weight: Analyzing transformers with vector norms. In EMNLP, 2020.
- Learning multiple layers of features from tiny images. 2009.
- Human-in-the-loop interpretability prior. In NeurIPS, 2018.
- Rethinking attention-model explainability through faithfulness violation test. In ICML, 2022.
- A unified approach to interpreting model predictions. In NeurIPS, 2017.
- Visualizing deep convolutional neural networks using natural pre-images. IJCV, 2016.
- Relative attributing propagation: Interpreting the comparative contributions of individual units in deep neural networks. In AAAI, 2020.
- Dong Nguyen. Comparing automatic and human evaluation of local explanations for text classification. In NAACL, 2018.
- IA-RED2superscriptIA-RED2\text{IA-RED}^{2}IA-RED start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT: Interpretability-aware redundancy reduction for vision transformers. In NeurIPS, 2021.
- Attcat: Explaining transformers via attentive class activation tokens. In NeurIPS, 2022.
- A consistent and efficient evaluation strategy for attribution methods. In ICML, 2022.
- Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In AAAI, 2018.
- Imagenet large scale visual recognition challenge. IJCV, 115:211–252, 2015.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017.
- Is attention interpretable? In ACL, 2019.
- Do input gradients highlight discriminative features? In NeurIPS, 2021.
- Learning important features through propagating activation differences. In ICML, 2017a.
- Not just a black box: Learning important features through propagating activation differences. In ICML, 2017b.
- Smoothgrad: removing noise by adding noise. In ICML Workshop, 2017.
- Full-gradient representation for neural network visualization. In NeurIPS, 2019.
- Axiomatic attribution for deep networks. In ICML, 2017.
- Training data-efficient image transformers & distillation through attention. In ICML, 2021.
- Attention is all you need. In NeurIPS, 2017.
- Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In ACL, 2019.
- A unified study of machine learning explanation evaluation metrics. arXiv preprint arXiv:2203.14265, 2022.
- Attention is not not explanation. In EMNLP, 2019.
- Shap-cam: Visual explanations for convolutional neural networks based on shapley value. In ECCV, 2022.
- Learning deep features for discriminative localization. In CVPR, 2016.
- Interpreting deep visual representations via network dissection. IEEE TPAMI, 2018.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.