Toward Understanding the Disagreement Problem in Neural Network Feature Attribution (2404.11330v1)
Abstract: In recent years, neural networks have demonstrated their remarkable ability to discern intricate patterns and relationships from raw data. However, understanding the inner workings of these black box models remains challenging, yet crucial for high-stake decisions. Among the prominent approaches for explaining these black boxes are feature attribution methods, which assign relevance or contribution scores to each input variable for a model prediction. Despite the plethora of proposed techniques, ranging from gradient-based to backpropagation-based methods, a significant debate persists about which method to use. Various evaluation metrics have been proposed to assess the trustworthiness or robustness of their results. However, current research highlights disagreement among state-of-the-art methods in their explanations. Our work addresses this confusion by investigating the explanations' fundamental and distributional behavior. Additionally, through a comprehensive simulation study, we illustrate the impact of common scaling and encoding techniques on the explanation quality, assess their efficacy across different effect sizes, and demonstrate the origin of inconsistency in rank-based evaluation metrics.
- Deep learning. Nature, 521(7553):436–444, 2015.
- Deep learning for AI. Commun. ACM, 64(7):58–65, 2021.
- Towards robust interpretability with self-explaining neural networks. Advances in Neural Information Processing Systems, 31:7786–7795, 2018.
- TabNet: Attentive interpretable tabular learning. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, volume 35, pages 6679–6687, 2021.
- Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 2668–2677, 2018.
- Perturbation-based methods for explaining deep neural networks: A survey. Pattern Recognition Letters, 150:228–234, 2021.
- RISE: Randomized input sampling for explanation of black-box models. In Proceedings of the British Machine Vision Conference, 2018.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
- Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713, 2016.
- SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
- Full-gradient representation for neural network visualization. Advances in Neural Information Processing Systems, 32, 2019.
- Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 3319–3328, 2017.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7):1–46, 2015.
- Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 3145–3153, 2017.
- Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems, 28(11):2660–2673, 2017.
- Towards better understanding of gradient-based attribution methods for deep neural networks. In Proceedings of the 6th International Conference on Learning Representations, 2018.
- On the (in)fidelity and sensitivity of explanations. Advances in Neural Information Processing Systems, 32, 2019.
- Framework for evaluating faithfulness of local explanations. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 4794–4815, 2022.
- The disagreement problem in explainable machine learning: A practitioner’s perspective. arXiv preprint arXiv:2202.01602, 2022.
- On quantitative aspects of model interpretability. arXiv preprint arXiv:2007.07584, 2020.
- Evaluating and aggregating feature-based model explanations. In Proceedings of the 29th International Joint Conference on Artificial Intelligence, pages 3016–3022, 2020.
- One explanation does not fit all: A toolkit and taxonomy of AI explainability techniques. arXiv preprint arXiv:1909.03012, 2019.
- Sanity checks for saliency maps. Advances in Neural Information Processing Systems, 31:9525–9536, 2018.
- Synthetic benchmarks for scientific research in explainable machine learning. In Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
- OpenXAI: Towards a transparent evaluation of model explanations. Advances in Neural Information Processing Systems, 35:15784–15799, 2022.
- Benchmarking attribution methods with relative feature importance. arXiv preprint arXiv:1907.09701, 2019.
- Order in the court: Explainable AI methods prone to disagreement. In ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI, 2021.
- It’s COMPASlicated: The messy relationship between RAI datasets and algorithmic fairness benchmarks. In Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
- Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
- A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(5):726–742, 2021.
- A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30:4765–4774, 2017.
- Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition, 65:211–222, 2017.
- Striving for simplicity: The all convolutional net. In 3rd International Conference on Learning Representations, Workshop Track, 2015.
- Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, 2017.
- Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision, pages 818–833, 2014.
- Local explanation methods for deep neural networks lack sensitivity to parameter values. In ICLR 2018 Workshop Track, 2018.
- Explaining models by propagating Shapley values of local components. Explainable AI in Healthcare and Medicine: Building a Culture of Transparency and Accountability, pages 261–270, 2021.
- L. S. Shapley. A value for n-person games. Contributions to the Theory of Games, II:307–318, 1953.
- Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3449–3457, 2017.
- A benchmark for interpretability methods in deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- Sanity checks for saliency metrics. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 6021–6029, 2020.
- Evaluating feature attribution methods in the image domain. arXiv preprint arXiv:2202.12270, 2022.
- Do explanations explain? Model knows best. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10244–10253, 2022.
- Top-down neural attention by excitation backprop. International Journal of Computer Vision, 126(10):1084–1102, 2017.
- CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations. Information Fusion, 81:14–40, 2022.
- Sanity simulations for saliency methods. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 11173–11200, 2022.
- Do feature attribution methods correctly attribute features? In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9623–9633, 2022.
- Quantifying explainability of saliency methods in deep neural networks with a synthetic dataset. IEEE Transactions on Artificial Intelligence, 4(4):858–870, 2023.
- Learning to explain: An information-theoretic perspective on model interpretation. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 883–892, 2018.
- Riccardo Guidotti. Evaluating local explanation methods on ground truth. Artificial Intelligence, 291:103428, 2021.
- How well do feature-additive explainers explain feature-additive predictors? In XAI in Action: Past, Present, and Future Applications, 2023.
- Towards best practice in explaining neural network decisions with LRP. In Proceedings of the International Joint Conference on Neural Networks, pages 1–7, 2020.
- Layer-wise relevance propagation: An overview. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pages 193–209, 2019.
- Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature Machine Intelligence, 3(7):620–631, 2021.
- The (un)reliability of saliency methods. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pages 267–280, 2019.
- On baselines for local feature attributions. arXiv preprint arXiv:2101.00905, 2021.
- Visualizing the impact of feature attribution baselines. Distill, 2020.
- Carefully choose the baseline: Lessons learned from applying XAI attribution methods for regression tasks in geoscience. Artificial Intelligence for the Earth Systems, 2(1):e220058, 2023.
- Objective assessment of the bias introduced by baseline signals in XAI attribution methods. In Proceedings of the IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering, pages 266–271, 2023.
- Unifying fourteen post-hoc attribution methods with Taylor interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–17, 2024.
- Interpreting deep neural networks with the package innsight. arXiv preprint arXiv:2306.10822, 2023.
- Niklas Koenen (7 papers)
- Marvin N. Wright (20 papers)