Do Concept Bottleneck Models Respect Localities? (2401.01259v3)
Abstract: Concept-based methods explain model predictions using human-understandable concepts. These models require accurate concept predictors, yet the faithfulness of existing concept predictors to their underlying concepts is unclear. In this paper, we investigate the faithfulness of Concept Bottleneck Models (CBMs), a popular family of concept-based architectures, by looking at whether they respect "localities" in datasets. Localities involve using only relevant features when predicting a concept's value. When localities are not considered, concepts may be predicted based on spuriously correlated features, degrading performance and robustness. This work examines how CBM predictions change when perturbing model inputs, and reveals that CBMs may not capture localities, even when independent concepts are localised to non-overlapping feature subsets. Our empirical and theoretical results demonstrate that datasets with correlated concepts may lead to accurate but uninterpretable models that fail to learn localities. Overall, we find that CBM interpretability is fragile, as CBMs occasionally rely upon spurious features, necessitating further research into the robustness of concept predictors.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018.
- Concept bottleneck models. In International Conference on Machine Learning, pages 5338–5348. PMLR, 2020.
- Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35:21400–21413, 2022.
- Post-hoc concept bottleneck models. arXiv preprint arXiv:2205.15480, 2022.
- Interactive concept bottleneck models. arXiv preprint arXiv:2212.07430, 2022.
- A closer look at the intervention procedure of concept bottleneck models. arXiv preprint arXiv:2302.14260, 2023.
- Do concept bottleneck models learn as intended? arXiv preprint arXiv:2105.04289, 2021.
- Explaining classifiers with causal concept effect (cace). arXiv preprint arXiv:1907.07165, 2019.
- Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019.
- Now you see me (cme): concept-based model extraction. arXiv preprint arXiv:2010.13233, 2020.
- On completeness-aware concept-based explanations in deep neural networks. Advances in Neural Information Processing Systems, 33:20554–20565, 2020.
- Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, 2020.
- Probabilistic concept bottleneck models. arXiv preprint arXiv:2306.01574, 2023.
- Label-free concept bottleneck models. arXiv preprint arXiv:2304.06129, 2023.
- Tabcbm: Concept-based interpretable neural networks for tabular data. Transactions on Machine Learning Research, 2023.
- How to explain individual classification decisions. The Journal of Machine Learning Research, 11:1803–1831, 2010.
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
- Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017.
- Promises and pitfalls of black-box concept learning models. arXiv preprint arXiv:2106.13314, 2021.
- Towards Robust Metrics for Concept Representation Evaluation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(10):11791–11799, June 2023. doi: 10.1609/aaai.v37i10.26392. URL https://ojs.aaai.org/index.php/AAAI/article/view/26392.
- Understanding and enhancing robustness of concept-based models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 15127–15135, 2023.
- Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 35:21212–21227, 2022a.
- Concept correlation and its effects on concept-based models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4780–4788, 2023.
- Addressing leakage in concept bottleneck models. Advances in Neural Information Processing Systems, 35:23386–23397, 2022.
- Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 35:21212–21227, 2022b.
- An investigation of why overparameterization exacerbates spurious correlations. In International Conference on Machine Learning, pages 8346–8356. PMLR, 2020.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017.
- Karl Pearson F.R.S. Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901. doi: 10.1080/14786440109462720.
- The information bottleneck method, 2000.
- What is the state of neural network pruning? Proceedings of machine learning and systems, 2:129–146, 2020.
- Naveen Raman (10 papers)
- Mateo Espinosa Zarlenga (14 papers)
- Juyeon Heo (11 papers)
- Mateja Jamnik (57 papers)