On the Concept Trustworthiness in Concept Bottleneck Models (2403.14349v1)
Abstract: Concept Bottleneck Models (CBMs), which break down the reasoning process into the input-to-concept mapping and the concept-to-label prediction, have garnered significant attention due to their remarkable interpretability achieved by the interpretable concept bottleneck. However, despite the transparency of the concept-to-label prediction, the mapping from the input to the intermediate concept remains a black box, giving rise to concerns about the trustworthiness of the learned concepts (i.e., these concepts may be predicted based on spurious cues). The issue of concept untrustworthiness greatly hampers the interpretability of CBMs, thereby hindering their further advancement. To conduct a comprehensive analysis on this issue, in this study we establish a benchmark to assess the trustworthiness of concepts in CBMs. A pioneering metric, referred to as concept trustworthiness score, is proposed to gauge whether the concepts are derived from relevant regions. Additionally, an enhanced CBM is introduced, enabling concept predictions to be made specifically from distinct parts of the feature map, thereby facilitating the exploration of their related regions. Besides, we introduce three modules, namely the cross-layer alignment (CLA) module, the cross-image alignment (CIA) module, and the prediction alignment (PA) module, to further enhance the concept trustworthiness within the elaborated CBM. The experiments on five datasets across ten architectures demonstrate that without using any concept localization annotations during training, our model improves the concept trustworthiness by a large margin, meanwhile achieving superior accuracy to the state-of-the-arts. Our code is available at https://github.com/hqhQAQ/ProtoCBM.
- Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In WACV 2018, 839–847. IEEE.
- Interactive Concept Bottleneck Models. In AAAI 2023, 5948–5955. AAAI Press.
- This Looks Like That: Deep Learning for Interpretable Image Recognition. In NIPS 2019, 8928–8939.
- Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes. In CVPR 2022, 10255–10265.
- Towards a Deeper Understanding of Concept Bottleneck Models Through End-to-End Explanation. arXiv preprint arXiv:2302.03578.
- Addressing Leakage in Concept Bottleneck Models. In NIPS 2022.
- Concept Correlation and Its Effects on Concept-Based Models. In WACV 2023, 4780–4788.
- This looks like that… does it? Shortcomings of latent space prototype interpretability in deep networks. arXiv preprint arXiv:2105.02968.
- Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks. In ICCV 2023, 2011–2020.
- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). In ICML 2018, volume 80 of Proceedings of Machine Learning Research, 2673–2682. PMLR.
- Adam: A Method for Stochastic Optimization. In ICLR 2015.
- Concept Bottleneck Models. In ICML 2020, volume 119 of Proceedings of Machine Learning Research, 5338–5348. PMLR.
- Similarity of Neural Network Representations Revisited. In ICML 2019, 3519–3529.
- Learning multiple layers of features from tiny images.
- Do concept bottleneck models learn as intended? arXiv preprint arXiv:2105.04289.
- Neural Prototype Trees for Interpretable Fine-Grained Image Recognition. In CVPR 2021, 14933–14943.
- Label-free Concept Bottleneck Models. In ICLR 2023. OpenReview.net.
- Learning Transferable Visual Models From Natural Language Supervision. In ICML 2021, volume 139 of Proceedings of Machine Learning Research, 8748–8763. PMLR.
- Interpretable Image Classification with Differentiable Prototypes Assignment. In ECCV 2022, 351–368.
- ProtoPShare: Prototypical Parts Sharing for Similarity Discovery in Interpretable Image Classification. In SIGKDD 2021, 1420–1430.
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In ICCV 2017, 618–626.
- The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 5(1): 1–9.
- The caltech-ucsd birds-200-2011 dataset.
- Interpretable Image Recognition by Constructing Transparent Embedding Space. In ICCV 2021, 875–884.
- ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition. arXiv preprint arXiv:2208.10431.
- Post-hoc Concept Bottleneck Models. In ICLR 2023. OpenReview.net.
- Qihan Huang (10 papers)
- Jie Song (217 papers)
- Jingwen Hu (9 papers)
- Haofei Zhang (20 papers)
- Yong Wang (498 papers)
- Mingli Song (163 papers)