Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? (2401.13544v3)
Abstract: Recently, interpretable machine learning has re-explored concept bottleneck models (CBM). An advantage of this model class is the user's ability to intervene on predicted concept values, affecting the downstream output. In this work, we introduce a method to perform such concept-based interventions on pretrained neural networks, which are not interpretable by design, only given a small validation set with concept labels. Furthermore, we formalise the notion of intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black boxes. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We focus on backbone architectures of varying complexity, from simple, fully connected neural nets to Stable Diffusion. We demonstrate that the proposed fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of our techniques, we apply them to deep chest X-ray classifiers and show that fine-tuned black boxes are more intervenable than CBMs. Lastly, we establish that our methods are still effective under vision-language-model-based concept annotations, alleviating the need for a human-annotated validation set.
- Meaningfully debugging model mistakes using conceptual counterfactual explanations. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 66–88. PMLR, 2022. URL https://proceedings.mlr.press/v162/abid22a.html.
- Understanding intermediate layers using linear classifier probes, 2016. URL https://doi.org/10.48550/arXiv.1610.01644. arXiv:1610.01644.
- Belinkov, Y. Probing Classifiers: Promises, Shortcomings, and Advances. Computational Linguistics, 48(1):207–219, 2022. URL https://doi.org/10.1162/coli_a_00422.
- Breiman, L. Random forests. Machine Learning, 45(1):5–32, 2001. URL https://doi.org/10.1023/a:1010933404324.
- Brier, G. W. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1):1–3, 1950. URL https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
- Interactive concept bottleneck models. Proceedings of the AAAI Conference on Artificial Intelligence, 37(5):5948–5955, 2023. URL https://ojs.aaai.org/index.php/AAAI/article/view/25736.
- Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, 2020. URL https://doi.org/10.1038/s42256-020-00265-z.
- Human uncertainty in concept-based ai systems, 2023. URL https://doi.org/10.48550/arXiv.2303.12872. arXiv:2303.12872.
- The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, pp. 233––240, New York, NY, USA, 2006. Association for Computing Machinery. URL https://doi.org/10.1145/1143844.1143874.
- Towards a rigorous science of interpretable machine learning, 2017. URL https://doi.org/10.48550/arXiv.1702.08608. arXiv:1702.08608.
- All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177):1–81, 2019. URL http://jmlr.org/papers/v20/18-760.html.
- Towards automatic concept-based explanations. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32, pp. 9277––9286. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/77d2afcb31f6493e350fca61764efb9a-Paper.pdf.
- On calibration of modern neural networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1321–1330. PMLR, 2017.
- Addressing leakage in concept bottleneck models. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=tglniD_fn9.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. URL https://doi.org/10.1109/CVPR.2016.90.
- CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pp. 590–597, 2019. URL https://ojs.aaai.org/index.php/AAAI/article/view/3834.
- Rethinking nearest neighbors for visual classification, 2021. URL https://doi.org/10.48550/arXiv.2112.08459. arXiv:2112.08459.
- MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6(1):317, 2019. URL https://doi.org/10.1038/s41597-019-0322-0.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 2668–2677. PMLR, 2018. URL https://proceedings.mlr.press/v80/kim18d.html.
- Probabilistic concept bottleneck models. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 16521–16540. PMLR, 2023a. URL https://proceedings.mlr.press/v202/kim23g.html.
- Grounding counterfactual explanation of image classifiers to textual concept space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10942–10950, 2023b.
- Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, 2015. URL http://arxiv.org/abs/1412.6980.
- Concept bottleneck models. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 5338–5348, Virtual, 2020. PMLR. URL https://proceedings.mlr.press/v119/koh20a.html.
- Attribute and simile classifiers for face verification. In 2009 IEEE 12th International Conference on Computer Vision, pp. 365–372, Kyoto, Japan, 2009. IEEE. URL https://doi.org/10.1109/ICCV.2009.5459250.
- Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009. IEEE. URL https://doi.org/10.1109/CVPR.2009.5206594.
- Influence-directed explanations for deep convolutional networks. In 2018 IEEE International Test Conference (ITC). IEEE, 2018. URL https://doi.org/10.1109/test.2018.8624792.
- Promises and pitfalls of black-box concept learning models, 2021. URL https://doi.org/10.48550/arXiv.2106.13314. arXiv:2106.13314.
- Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis. Medical Image Analysis, 91:103042, 2024. URL https://www.sciencedirect.com/science/article/pii/S136184152300302X.
- GlanceNets: Interpretable, leak-proof concept-based models, 2022. arXiv:2205.15612.
- Contextual semantic interpretability. In Ishikawa, H., Liu, C., Pajdla, T., and Shi, J. (eds.), Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Revised Selected Papers, Part IV, volume 12625 of Lecture Notes in Computer Science, pp. 351–368. Springer, 2020. URL https://doi.org/10.1007/978-3-030-69538-5_22.
- Do concept bottleneck models learn as intended?, 2021. URL https://doi.org/10.48550/arXiv.2105.04289. arXiv:2105.04289.
- Molnar, C. Interpretable Machine Learning. 2 edition, 2022. URL https://christophm.github.io/interpretable-ml-book.
- Deep k-means: Jointly clustering with k-means and learning representations. Pattern Recognition Letters, 138:185–192, 2020. URL https://www.sciencedirect.com/science/article/pii/S0167865520302749.
- Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, pp. 607–617, New York, NY, USA, 2020. Association for Computing Machinery. URL https://doi.org/10.1145/3351095.3372850.
- Label-free concept bottleneck models. In The 11th International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=FlCg47MNvBA.
- PyTorch: An imperative style, high-performance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32, Red Hook, NY, United States, 2019. Curran Associates, Inc. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(85):2825–2830, 2011. URL http://jmlr.org/papers/v12/pedregosa11a.html.
- Ruder, S. An overview of multi-task learning in deep neural networks, 2017. URL https://doi.org/10.48550/arXiv.1706.05098. arXiv:1706.05098.
- Concept bottleneck model with additional unsupervised concepts. IEEE Access, 10:41758–41765, 2022. URL https://doi.org/10.1109/ACCESS.2022.3167702.
- Learning from uncertain concepts via test time interventions. In Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022, 2022. URL https://openreview.net/forum?id=WVe3vok8Cc3.
- A closer look at the intervention procedure of concept bottleneck models. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 31504–31520. PMLR, 2023. URL https://proceedings.mlr.press/v202/shin23a.html.
- Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014. URL http://jmlr.org/papers/v15/srivastava14a.html.
- Learning to intervene on concept bottlenecks, 2023. URL https://doi.org/10.48550/arXiv.2308.13453. arXiv:2308.13453.
- Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31, 2017. URL https://doi.org/10.2139/ssrn.3063289.
- Caltech-UCSD Birds-200-2011, 2011. URL https://authors.library.caltech.edu/records/cvm3y-5hh21. Technical report. CNS-TR-2011-001. California Institute of Technology.
- Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9):2251–2265, 2019. URL https://doi.org/10.1109/tpami.2018.2857768.
- On completeness-aware concept-based explanations in deep neural networks. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 20554–20565. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/ecb287ff763c169694f682af52c1f309-Paper.pdf.
- Post-hoc concept bottleneck models. In The 11th International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=nA5AZ8CEyow.