XAI Benchmark for Visual Explanation (2310.08537v2)
Abstract: The rise of deep learning has ushered in significant progress in computer vision (CV) tasks, yet the "black box" nature of these models often precludes interpretability. This challenge has spurred the development of Explainable Artificial Intelligence (XAI) by generating explanations to AI's decision-making process. An explanation is aimed to not only faithfully reflect the true reasoning process (i.e., faithfulness) but also align with humans' reasoning (i.e., alignment). Within XAI, visual explanations employ visual cues to elucidate the reasoning behind machine learning models, particularly in image processing, by highlighting images' critical areas important to predictions. Despite the considerable body of research in visual explanations, standardized benchmarks for evaluating them are seriously underdeveloped. In particular, to evaluate alignment, existing works usually merely illustrate a few images' visual explanations, or hire some referees to report the explanation quality under ad-hoc questionnaires. However, this cannot achieve a standardized, quantitative, and comprehensive evaluation. To address this issue, we develop a benchmark for visual explanation, consisting of eight datasets with human explanation annotations from various domains, accommodating both post-hoc and intrinsic visual explanation methods. Additionally, we devise a visual explanation pipeline that includes data loading, explanation generation, and method evaluation. Our proposed benchmarks facilitate a fair evaluation and comparison of visual explanation methods. Building on our curated collection of datasets, we benchmarked eight existing visual explanation methods and conducted a thorough comparison across four selected datasets using six alignment-based and causality-based metrics. Our benchmark will be accessible through our website https://xaidataset.github.io.
- A. Adadi and M. Berrada. Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE access, 6:52138–52160, 2018.
- An explainable ai library for benchmarking graph explainers. 2018.
- Openxai: Towards a transparent evaluation of model explanations, 2023a.
- Evaluating explainability for graph neural networks, 2023b.
- The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans. Medical physics, 38:915–931, 2011.
- Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations. Information Fusion, 81:14–40, 2022. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2021.11.008. URL https://www.sciencedirect.com/science/article/pii/S1566253521002335.
- Human-centered evaluation of explanations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts, pages 26–32, 2022.
- Analysis of explainers of black box deep neural networks for computer vision: A survey. Machine Learning and Knowledge Extraction, 3:966–989, 2021.
- Evaluating and characterizing human rationales. arXiv preprint arXiv:2010.04736, 2020.
- Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV), pages 839–847. IEEE, 2018.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Eraser: A benchmark to evaluate rationalized nlp models, 2020.
- F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88:303–338, June 2010.
- Attention branch network: Learning of attention mechanism for visual explanation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10705–10714, 2019.
- Res: A robust framework for guiding visual explanation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 432–442, 2022a.
- Aligning eyes between humans and deep neural network through interactive attention alignment. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–28, 2022b.
- C. Garbacea and Q. Mei. Neural language generation: Formulation, methods, and evaluation. arXiv preprint arXiv:2007.15780, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Ground truth explanation dataset for chemical property prediction on molecular graphs. 2022.
- Perturbation-based methods for explaining deep neural networks: A survey. Pattern Recognition Letters, 150:228–234, 2021.
- Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 252–262, 2018.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Maskgan: Towards diverse and interactive facial image manipulation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014. URL http://arxiv.org/abs/1405.0312.
- Synthetic benchmarks for scientific research in explainable machine learning, 2021a.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021b.
- Model-agnostic interpretability with shapley values. In 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), pages 1–7. IEEE, 2019.
- Sixray : A large-scale security inspection x-ray benchmark for prohibited item discovery in overlapping images, 2019.
- Local interpretable model-agnostic explanations for music content analysis. In ISMIR, volume 53, pages 537–543, 2017.
- Leveraging guided backpropagation to select convolutional neural networks for plant classification. Frontiers in Artificial Intelligence, 5:871162, 2022.
- Cats and dogs. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.
- Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018.
- Explainable artificial intelligence (xai) on timeseries data: A survey. arXiv preprint arXiv:2104.00950, 2021.
- Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part I 18, pages 556–564. Springer, 2015.
- Imagenet large scale visual recognition challenge, 2015.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713, 2016.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
- Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
- Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017.
- E. Tjoa and C. Guan. A survey on explainable artificial intelligence (xai): Toward medical xai. IEEE transactions on neural networks and learning systems, 32:4793–4813, 2020.
- B. Vandersmissen and J. Oramas. On the coherence of quantitative evaluation of visual expalantion. arXiv preprint arXiv:2302.10764, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Score-cam: Score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 24–25, 2020.
- Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4651–4659, 2016.
- Using “annotator rationales” to improve machine learning for text categorization. In Human language technologies 2007: The conference of the North American chapter of the association for computational linguistics; proceedings of the main conference, pages 260–267, 2007.
- Learning deep features for discriminative localization, 2015.
- Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
- Interpretable basis decomposition for visual explanation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 119–134, 2018.
- Yifei Zhang (167 papers)
- Siyi Gu (7 papers)
- James Song (2 papers)
- Bo Pan (31 papers)
- Guangji Bai (24 papers)
- Liang Zhao (353 papers)