Gradient based Feature Attribution in Explainable AI: A Technical Review (2403.10415v1)
Abstract: The surge in black-box AI models has prompted the need to explain the internal mechanism and justify their reliability, especially in high-stakes applications, such as healthcare and autonomous driving. Due to the lack of a rigorous definition of explainable AI (XAI), a plethora of research related to explainability, interpretability, and transparency has been developed to explain and analyze the model from various perspectives. Consequently, with an exhaustive list of papers, it becomes challenging to have a comprehensive overview of XAI research from all aspects. Considering the popularity of neural networks in AI research, we narrow our focus to a specific area of XAI research: gradient based explanations, which can be directly adopted for neural network models. In this review, we systematically explore gradient based explanation methods to date and introduce a novel taxonomy to categorize them into four distinct classes. Then, we present the essence of technique details in chronological order and underscore the evolution of algorithms. Next, we introduce both human and quantitative evaluations to measure algorithm performance. More importantly, we demonstrate the general challenges in XAI and specific challenges in gradient based explanations. We hope that this survey can help researchers understand state-of-the-art progress and their corresponding disadvantages, which could spark their interest in addressing these issues in future work.
- Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–18.
- Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138–52160.
- Sanity checks for saliency maps. In Advances in Neural Information Processing Systems. 9505–9515.
- Towards better understanding of gradient-based attribution methods for deep neural networks. arXiv preprint arXiv:1711.06104 (2017).
- Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58 (2020), 82–115.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10, 7 (2015).
- Impossibility theorems for feature attribution. Proceedings of the National Academy of Sciences 121, 2 (2024), e2304406120. https://doi.org/10.1073/pnas.2304406120 arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2304406120
- Layer-wise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks. Springer, 63–71.
- RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation. arXiv preprint arXiv:2306.11706 (2023).
- Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).
- Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (2019), 832.
- Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 839–847.
- Improving simple models with confidence profiles. Advances in Neural Information Processing Systems 31 (2018).
- Explainable artificial intelligence: A survey. In 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, 0210–0215.
- The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision 111, 1 (Jan. 2015), 98–136.
- Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE/CVF international conference on computer vision. 2950–2958.
- Ruth C Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE international conference on computer vision. 3429–3437.
- Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.
- Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological modelling 160, 3 (2003), 249–264.
- Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681–3688.
- Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA). IEEE, 80–89.
- Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 315–323.
- Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. journal of Computational and Graphical Statistics 24, 1 (2015), 44–65.
- Understanding individual decisions of cnns via contrastive backpropagation. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14. Springer, 119–134.
- Local rule-based explanations of black box decision systems. arXiv preprint arXiv:1805.10820 (2018).
- A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51, 5 (2018), 1–42.
- A Data-Driven Analysis of Workers’ Earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3173574.3174023
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Fast axiomatic attribution for neural networks. Advances in Neural Information Processing Systems 34 (2021), 19513–19524.
- Visual analytics in deep learning: An interrogative survey for the next frontiers. IEEE transactions on visualization and computer graphics 25, 8 (2018), 2674–2693.
- Robert C Holte. 1993. Very simple classification rules perform well on most commonly used datasets. Machine learning 11, 1 (1993), 63–90.
- A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems. 9737–9748.
- Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). 4176–4185. https://doi.org/10.1109/ICCVW.2019.00513
- Explaining explanations: Axiomatic feature interactions for deep networks. The Journal of Machine Learning Research 22, 1 (2021), 4687–4740.
- Lauren Kirchner Jeff Larson, Surya Mattu and Julia Angwin. 2016. How We Analyzed the COMPAS Recidivism Algorithm. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
- Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589.
- Hyungsik Jung and Youngrock Oh. 2021. Towards better explanations of class activation mapping. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1336–1344.
- XRAI: Better Attributions Through Regions. In Proceedings of the IEEE International Conference on Computer Vision. 4948–4957.
- Guided integrated gradients: An adaptive path method for removing noise. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5050–5058.
- Why are saliency maps noisy? cause of and solution to noisy saliency maps. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 4149–4157.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning. PMLR, 2668–2677.
- Towards unifying feature attribution and counterfactual explanations: Different means to the same end. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 652–663.
- Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.
- V KURKOVA and PC KAINEN. 1994. Functionally equivalent feedforward neural networks. Neural computation 6, 3 (1994), 543–558.
- Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 131–138.
- Rethinking Explainability as a Dialogue: A Practitioner’s Perspective. arXiv preprint arXiv:2202.01875 (2022).
- Deep learning for detecting robotic grasps. The International Journal of Robotics Research 34, 4-5 (2015), 705–724.
- Miguel Lerma and Mirtha Lucas. 2021. Symmetry-preserving paths in integrated gradients. arXiv preprint arXiv:2103.13533 (2021).
- Visualizing and Understanding Neural Models in NLP. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Kevin Knight, Ani Nenkova, and Owen Rambow (Eds.). Association for Computational Linguistics, San Diego, California, 681–691. https://doi.org/10.18653/v1/N16-1082
- Negative Flux Aggregation to Estimate Feature Attributions. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 446–454. https://doi.org/10.24963/IJCAI.2023/50
- Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing 419 (2021), 168–182.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12009–12019.
- Aravindh Mahendran and Andrea Vedaldi. 2016. Salient deconvolutional networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14. Springer, 120–135.
- Investigating saturation effects in integrated gradients. arXiv preprint arXiv:2010.12697 (2020).
- Model Reconstruction from Model Explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ’19). Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/3287560.3287562
- Christoph Molnar. 2020. Interpretable Machine Learning. Lulu. com.
- Layer-wise relevance propagation: an overview. In Explainable AI: interpreting, explaining and visualizing deep learning. Springer, 193–209.
- Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognition 65 (2017), 211–222.
- Interpreting deep neural networks with relative sectional propagation by analyzing comparative gradients and hostile activations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11604–11612.
- How do humans understand explanations from machine learning systems? an evaluation of the human-interpretability of explanation. arXiv preprint arXiv:1802.00682 (2018).
- Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224 (2019).
- Is object localization for free?-weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 685–694.
- Explaining deep neural network models with adversarial gradient integration. In Thirtieth International Joint Conference on Artificial Intelligence (IJCAI).
- RISE: Randomized Input Sampling for Explanation of Black-box Models. In Proceedings of the British Machine Vision Conference (BMVC).
- Explainability Methods for Graph Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Visualizing Deep Networks by Optimizing with Integrated Gradients.. In CVPR Workshops, Vol. 2. 1–4.
- J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 (1986), 81–106.
- " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
- Explanation as a social practice: Toward a conceptual framework for the social design of AI systems. IEEE Transactions on Cognitive and Developmental Systems 13, 3 (2020), 717–728.
- Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems 28, 11 (2016), 2660–2673.
- Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296 (2017).
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618–626.
- Improved protein structure prediction using potentials from deep learning. Nature 577, 7792 (2020), 706–710.
- Noise-adding methods of saliency map as series of higher order partial derivative. arXiv preprint arXiv:1806.03000 (2018).
- Learning important features through propagating activation differences. In International conference on machine learning. PMLR, 3145–3153.
- Integrated directional gradients: Feature interaction attribution for neural NLP models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 865–878.
- Deep inside convolutional networks: visualising image classification models and saliency maps. In Proceedings of the International Conference on Learning Representations (ICLR). ICLR.
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
- Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014).
- Suraj Srinivas and François Fleuret. 2019. Full-gradient representation for neural network visualization. In Advances in Neural Information Processing Systems. 4126–4135.
- Visualizing the impact of feature attribution baselines. Distill 5, 1 (2020), e22.
- Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 3319–3328.
- Erico Tjoa and Cuntai Guan. 2019. A survey on explainable artificial intelligence (XAI): towards medical XAI. arXiv preprint arXiv:1907.07374 (2019).
- Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
- Gradient-based Analysis of NLP Models is Manipulable. In Findings of the Association for Computational Linguistics: EMNLP 2020. 247–258.
- Bias also matters: Bias attribution for deep neural network explanation. In International Conference on Machine Learning. PMLR, 6659–6667.
- DualCF: Efficient Model Extraction Attack from Counterfactual Explanations. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1318–1329. https://doi.org/10.1145/3531146.3533188
- Summarizing user-item matrix by group utility maximization. ACM Transactions on Knowledge Discovery from Data 17, 6 (2023), 1–22.
- Robust Models Are More Interpretable Because Attributions Look Normal. In International Conference on Machine Learning. PMLR, 22625–22651.
- Attribution in scale and space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9680–9689.
- IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 23725–23734.
- BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 1–9.
- Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818–833.
- Top-down neural attention by excitation backprop. International Journal of Computer Vision 126, 10 (2018), 1084–1102.
- History-Aware Hierarchical Transformer for Multi-session Open-domain Dialogue System. In Findings of the Association for Computational Linguistics: EMNLP 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 3395–3407. https://doi.org/10.18653/v1/2022.findings-emnlp.247
- May I Ask a Follow-up Question? Understanding the Benefits of Conversations in Neural Network Explainability. arXiv preprint arXiv:2309.13965 (2023).
- Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856 (2014).
- Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921–2929.
- Yongjie Wang (36 papers)
- Tong Zhang (569 papers)
- Xu Guo (85 papers)
- Zhiqi Shen (62 papers)