Less is More: Discovering Concise Network Explanations (2405.15243v3)
Abstract: We introduce Discovering Conceptual Network Explanations (DCNE), a new approach for generating human-comprehensible visual explanations to enhance the interpretability of deep neural image classifiers. Our method automatically finds visual explanations that are critical for discriminating between classes. This is achieved by simultaneously optimizing three criteria: the explanations should be few, diverse, and human-interpretable. Our approach builds on the recently introduced Concept Relevance Propagation (CRP) explainability method. While CRP is effective at describing individual neuronal activations, it generates too many concepts, which impacts human comprehension. Instead, DCNE selects the few most important explanations. We introduce a new evaluation dataset centered on the challenging task of classifying birds, enabling us to compare the alignment of DCNE's explanations to those of human expert-defined ones. Compared to existing eXplainable Artificial Intelligence (XAI) methods, DCNE has a desirable trade-off between conciseness and completeness when summarizing network explanations. It produces 1/30 of CRP's explanations while only resulting in a slight reduction in explanation quality. DCNE represents a step forward in making neural network decisions accessible and interpretable to humans, providing a valuable tool for both researchers and practitioners in XAI and model alignment.
- From attribution maps to human-understandable explanations through concept relevance propagation. Nature Machine Intelligence, 5(9):1006–1019, 2023.
- CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines. In AAAI, 2020.
- AllAboutBirds. All About Birds. https://www.allaboutbirds.org.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One, 2015.
- Network dissection: Quantifying interpretability of deep visual representations. In CVPR, 2017.
- Layer-wise relevance propagation for neural networks with local renormalization layers. In ICANN, 2016.
- Convolutional dynamic alignment networks for interpretable classifications. In CVPR, 2021.
- B-cos networks: alignment is all we need for interpretability. In CVPR, 2022.
- Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. In ICLR, 2019.
- The next generation of medical decision support: a roadmap toward transparent expert companions. Frontiers in artificial intelligence, 2020.
- Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In WACV, 2018.
- This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32, 2019.
- Deep feature factorization for concept discovery. In ECCV, 2018.
- Opportunities and challenges in explainable artificial intelligence (xai): A survey. arXiv:2006.11371, 2020.
- Deformable protopnet: An interpretable image classifier using deformable prototypes. In CVPR, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In KDD, 1996.
- Craft: Concept recursive activation factorization for explainability. In CVPR, 2023.
- Understanding deep networks via extremal perturbations and smooth masks. In ICCV, 2019.
- Axiom-based grad-cam: Towards accurate visualization and explanation of cnns. arXiv:2008.02312, 2020.
- Neuron shapley: Discovering the responsible neurons. NeurIPS, 2020.
- Towards automatic concept-based explanations. NeurIPS, 2019.
- Jacob Gildenblat and contributors. Pytorch library for cam methods. https://github.com/jacobgil/pytorch-grad-cam.
- Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 2015.
- European union regulations on algorithmic decision-making and a “right to explanation”. AI magazine, 2017.
- Explaining classifiers with causal concept effect (cace). arXiv preprint arXiv:1907.07165, 2019.
- Nathan A Greenblatt. Self-driving cars and the law. IEEE Spectrum, 2016.
- Local rule-based explanations of black box decision systems. arXiv:1805.10820, 2018.
- Deep residual learning for image recognition. In CVPR, 2016.
- Funnybirds: A synthetic vision dataset for a part-based analysis of explainable ai methods. In ICCV, 2023.
- Metrics for explainable ai: Challenges and prospects. arXiv preprint arXiv:1812.04608, 2018.
- iNaturalist. iNaturalist. www.inaturalist.org.
- Layercam: Exploring hierarchical class activation maps for localization. Transactions on Image Processing, 2021.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In ICML, 2018.
- Hive: evaluating the human interpretability of visual explanations. In ECCV, 2022.
- Concept bottleneck models. In ICML, 2020.
- Andrew L Kun et al. Human-machine interaction for vehicles: Review and outlook. Foundations and Trends in Human–Computer Interaction, 2018.
- Interpreting individual classifications of hierarchical networks. In Symposium on computational intelligence and data mining, 2013.
- Algorithms for non-negative matrix factorization. In NeurIPS, 2000.
- Learning the parts of objects by non-negative matrix factorization. Nature, 1999.
- Network in network. In ICLR, 2014.
- A unified approach to interpreting model predictions. NeurIPS, 2017.
- Teaching categories to human learners with visual explanations. In CVPR, 2018.
- George A Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 1956.
- Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognition, 2017.
- Visualization of neural networks using saliency maps. In ICNN, 1995.
- A survey on theories and applications for self-driving cars based on deep learning methods. Applied Sciences, 2020.
- Scikit-learn: Machine learning in Python. JMLR, 2011.
- Assimilating complex information. Learning and instruction, 2002.
- Visual explanation of evidence with additive classifiers. In National Conference on Artificial Intelligence, 2006.
- Ai in health and medicine. Nature Medicine, 2022.
- “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In SIGKDD, 2016.
- Anchors: High-precision model-agnostic explanations. In AAAI, 2018.
- Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 2019.
- Best of both worlds: local and global explanations with human-understandable concepts. arXiv:2106.08641, 2021.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017.
- Learning important features through propagating activation differences. In ICML, 2017.
- Very deep convolutional networks for large-scale image recognition. In Yoshua Bengio and Yann LeCun (eds.), ICLR, 2015.
- Full-gradient representation for neural network visualization. NeurIPS, 2019.
- Axiomatic attribution for deep networks. In ICML, 2017.
- Label Studio: Data labeling software. Open source software available from https://github. com/heartexlabs/label-studio, 2020.
- Human–computer collaboration for skin cancer recognition. Nature Medicine, 2020.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- Should health care demand interpretable artificial intelligence or accept “black box” medicine? Annals of internal medicine, 2020.
- Explainable ai: A brief survey on history, research areas, approaches and challenges. In Natural Language Processing and Chinese Computing: 8th CCF International Conference, 2019.
- What’s inside the black box? ai challenges for lawyers and researchers. Legal Information Management, 2019.
- Visualizing and understanding convolutional networks. In ECCV, 2014.
- Schema inference for interpretable image classification. In ICLR, 2023.
- Top-down neural attention by excitation backprop. IJCV, 2018.