Multi-dimensional concept discovery (MCD): A unifying framework with completeness guarantees (2301.11911v2)
Abstract: The completeness axiom renders the explanation of a post-hoc XAI method only locally faithful to the model, i.e. for a single decision. For the trustworthy application of XAI, in particular for high-stake decisions, a more global model understanding is required. Recently, concept-based methods have been proposed, which are however not guaranteed to be bound to the actual model reasoning. To circumvent this problem, we propose Multi-dimensional Concept Discovery (MCD) as an extension of previous approaches that fulfills a completeness relation on the level of concepts. Our method starts from general linear subspaces as concepts and does neither require reinforcing concept interpretability nor re-training of model parts. We propose sparse subspace clustering to discover improved concepts and fully leverage the potential of multi-dimensional subspaces. MCD offers two complementary analysis tools for concepts in input space: (1) concept activation maps, that show where a concept is expressed within a sample, allowing for concept characterization through prototypical samples, and (2) concept relevance heatmaps, that decompose the model decision into concept contributions. Both tools together enable a detailed understanding of the model reasoning, which is guaranteed to relate to the model via a completeness relation. This paves the way towards more trustworthy concept-based XAI. We empirically demonstrate the superiority of MCD against more constrained concept definitions.
- From "where" to "what": Towards human-understandable explanations through concept relevance propagation. arXiv preprint 2206.03208, 2022.
- Scikit-dimension: a python package for intrinsic dimension estimation. arXiv preprint 2109.02596, 2021.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7):e0130140, 2015.
- Network dissection: Quantifying interpretability of deep visual representations. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 6541–6549, 2017.
- Navier-stokes, fluid dynamics, and image and video inpainting. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2001.
- Towards novel insights in lattice field theory with explainable machine learning. Physical Review D, 101(9):094507, 2020.
- Preddiff: Explanations and interactions from conditional expectations. Artificial Intelligence, 312:103774, 2022. ISSN 0004-3702.
- Emerging properties in self-supervised vision transformers. In International Conference on Computer Vision, pp. 9630–9640, 2021.
- This looks like that: deep learning for interpretable image recognition. Advances in Neural Information Processing Systems, 32, 2019.
- Spectral clustering: A semi-supervised approach. Neurocomputing, 77(1):229–242, 2012.
- Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, Dec 2020.
- Disentangled explanations of neural network predictions by finding relevant subspaces. arXiv preprint 2212.14855, 2022.
- Explaining by removing: A unified framework for model explanation. Journal of Machine Learning Research, 22(209):1–90, 2021.
- Jonathan Crabbé and Mihaela van der Schaar. Concept activation regions: A generalized framework for concept-based explanations. Advances in Neural Information Processing Systems, 35, 2022.
- Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
- Sparse subspace clustering: Algorithm, theory, and applications. In IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 2765–2781. IEEE, 2013.
- Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8730–8738, 2018.
- An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 100(2):176–183, 1971.
- Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019.
- Resolving challenges in deep learning-based analyses of histopathological images using explanation methods. Scientific Reports, 10:6423, 2020.
- Jihun Hamm. Subspace-based learning with Grassmann kernels. PhD thesis, University of Pennsylvania, 2008.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
- Camille Jordan. Essai sur la géométrie à n𝑛nitalic_n dimensions. Bulletin de la Société Mathématique de France, 3:103–174, 1875.
- Pace: Posthoc architecture-agnostic concept extractor for explaining cnns. In International Joint Conference on Neural Networks, 2021.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International Conference on Machine Learning, pp. 2668–2677. PMLR, 2018.
- Concept bottleneck models. In Hal Daum’e III and Aarti Singh (eds.), International Conference on Machine Learning, volume 119, pp. 5338–5348, 2020.
- Mace: Model agnostic concept extractor for explaining image classification networks. IEEE Transactions on Artificial Intelligence, 2(6):574–583, 2021.
- Unmasking clever hans predictors and assessing what machines really learn. Nature communications, 10:1096, 2019.
- Instance-wise or class-wise? a tale of neighbor shapley for concept-based explanation. In ACM International Conference on Multimedia, pp. 3664–3672, 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE International Conference on Computer Vision, 2021.
- A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 2017.
- Glancenets: Interpretabile, leak-proof concept-based models. In UAI 2022 Workshop on Causal Representation Learning, 2022.
- Acquisition of chess knowledge in alphazero. Proceedings of the National Academy of Sciences, 119(47):e2206625119, 2022.
- Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73:1–15, 2018.
- Explainable artificial intelligence for bias detection in covid ct-scan classifiers. Sensors, 21(16):5657, 2021.
- Neural basis models for interpretability. Advances in Neural Information Processing Systems, 35, 2022.
- Innovation pursuit: A new approach to the subspace clustering problem. In International Conference on Machine Learning, volume 70, pp. 2874–2882, 2017a.
- Subspace clustering via optimal direction search. IEEE Signal Processing Letters, 24(12):1793–1797, 2017b.
- "why should I trust you?": Explaining the predictions of any classifier. In International Conference Knowledge Discovery and Data Mining, pp. 1135–1144, 2016.
- Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems, 28(11):2660–2673, 2017. doi: 10.1109/TNNLS.2016.259982.
- Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE, 109(3):247–278, 2021.
- Cybersecurity knowledge extraction using xai. Applied Sciences, 12(17):8669, 2022.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2):336–359, 2020.
- A geometric analysis of subspace clustering with outliers. The Annals of Statistics, 40(4):2195–2238, 2012.
- Axiomatic attribution for deep networks. In International Conference on Machine Learning, volume 70, pp. 3319–3328. JMLR, 2017.
- Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
- Just another “clever hans”? neural networks and fdg pet-ct to predict the outcome of patients with breast cancer. European journal of nuclear medicine and molecular imaging, 48(10):3141–3150, 2021.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Resnet strikes back: An improved training procedure in timm. arXiv preprint 2110.00476, 2021.
- Towards global explanations of convolutional neural networks with concept attribution. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8649–8658, 2020.
- On completeness-aware concept-based explanations in deep neural networks. Advances in Neural Information Processing Systems, 33, 2020.
- Oracle based active set algorithm for scalable elastic net subspace clustering. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3928–3937, 2016a.
- Scalable sparse subspace clustering by orthogonal matching pursuit. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3918–3927, 2016b.
- Concept embedding models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In AAAI Conference on Artificial Intelligence, pp. 11682–11690, 2021.
- Learning deep features for discriminative localization. In IEEE Conference on Computer Cision and Pattern Recognition, pp. 2921–2929, 2016.