Counterfactual Concept Bottleneck Models (2402.01408v3)
Abstract: Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), simulate changes in the situation to evaluate how this impacts class predictions (the "How?"), and imagine how the scenario should change to result in different class predictions (the "Why not?"). The inability to answer these questions represents a crucial gap in deploying reliable AI agents, calibrating human trust, and improving human-machine interaction. To bridge this gap, we introduce CounterFactual Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently address the above queries all at once without the need to run post-hoc searches. Our experimental results demonstrate that CF-CBMs: achieve classification accuracy comparable to black-box models and existing CBMs ("What?"), rely on fewer important concepts leading to simpler explanations ("How?"), and produce interpretable, concept-based counterfactuals ("Why not?"). Additionally, we show that training the counterfactual generator jointly with the CBM leads to two key improvements: (i) it alters the model's decision-making process, making the model rely on fewer important concepts (leading to simpler explanations), and (ii) it significantly increases the causal effect of concept interventions on class predictions, making the model more responsive to these changes.
- Meaningfully debugging model mistakes using conceptual counterfactual explanations. In International Conference on Machine Learning, pages 66–88. PMLR, 2022.
- Debiasing concept-based explanations with causal analysis. arXiv preprint arXiv:2007.11500, 2020.
- Interpretable neural-symbolic concept reasoning, 2023.
- Resampled Priors for Variational Autoencoders. In AISTATS, pages 66–75, 2019.
- Variational Lossy Autoencoder. In ICLR, 2017.
- Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35, 2022.
- Dissect: Disentangled simultaneous explanations via concept traversals. arXiv preprint arXiv:2105.15164, 2021.
- The privacy issue of counterfactual explanations: Explanation linkage attacks. ACM Trans. Intell. Syst. Technol., 14(5), 2023.
- Explaining classifiers with causal concept effect (cace). arXiv preprint arXiv:1907.07165, 2019.
- Counternet: End-to-end training of prediction aware counterfactual explanations. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 577–589, 2023.
- Vcnet: A self-explaining model for realistic counterfactual generation. In Massih-Reza Amini, Stéphane Canu, Asja Fischer, Tias Guns, Petra Kralj Novak, and Grigorios Tsoumakas, editors, Machine Learning and Knowledge Discovery in Databases, pages 437–453, Cham, 2023. Springer International Publishing.
- A simple generalisation of the area under the roc curve for multiple class classification problems. Mach. Learn., 45(2):171–186, oct 2001.
- Deep residual learning for image recognition, 2015.
- J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007.
- Paul Jaccard. The distribution of the flora in the alpine zone. 1. New phytologist, 11(2):37–50, 1912.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018.
- Probabilistic concept bottleneck models. arXiv preprint arXiv:2306.01574, 2023.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Learning Hierarchical Priors in VAEs. In NeurIPS, pages 2866–2875, 2019.
- Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR, 2020.
- Issues with post-hoc counterfactual explanations: a discussion, 2019.
- BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling. In NeurIPS, pages 6548–6558, 2019.
- Preserving causal constraints in counterfactual explanations for machine learning classifiers, 2020.
- DeepProbLog: Neural Probabilistic Logic Programming. In NeurIPS, pages 3753–3763, 2018.
- Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 35:21212–21227, 2022.
- Neuro-symbolic reasoning shortcuts: Mitigation strategies and their limitations. arXiv preprint arXiv:2303.12578, 2023.
- dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017.
- Vael: Bridging variational autoencoders and probabilistic logic programming. Advances in Neural Information Processing Systems, 35:4667–4679, 2022.
- Evaluating the privacy exposure of interpretable global explainers. In 2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI), pages 13–19, 2022.
- Label-free concept bottleneck models, 2023.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- Learning model-agnostic counterfactual explanations for tabular data. In Proceedings of The Web Conference 2020, WWW ’20. ACM, April 2020.
- On the privacy risks of algorithmic recourse, 2022.
- Causal inference in statistics: A primer. John Wiley & Sons, 2016.
- Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Concept-based explainable artificial intelligence: A survey. arXiv preprint arXiv:2312.12936, 2023.
- Generating Diverse High-Fidelity Images with VQ-VAE-2. In NeurIPS, pages 14837–14847, 2019.
- Markov logic networks. Machine learning, 62:107–136, 2006.
- Baycon: Model-agnostic bayesian counterfactual generator. In Lud De Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 740–746. International Joint Conferences on Artificial Intelligence Organization, 7 2022. Main Track.
- Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215, 2019.
- Ladder Variational Autoencoders. In NeurIPS, pages 3738–3746, 2016.
- NVAE: A Deep Hierarchical Variational Autoencoder. In NeurIPS, 2020.
- Neural Discrete Representation Learning. In NeurIPS, pages 6306–6315, 2017.
- Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841, 2017.
- Caltech-ucsd birds 200. Technical Report CNS-TR-201, Caltech, 2010.
- On completeness-aware concept-based explanations in deep neural networks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 20554–20565. Curran Associates, Inc., 2020.
- Post-hoc concept bottleneck models. In ICLR 2022 Workshop on PAIR^2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022.
- Towards robust metrics for concept representation evaluation. arXiv preprint arXiv:2301.10367, 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.