Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? (2401.13544v3)

Published 24 Jan 2024 in cs.LG and stat.ML

Abstract: Recently, interpretable machine learning has re-explored concept bottleneck models (CBM). An advantage of this model class is the user's ability to intervene on predicted concept values, affecting the downstream output. In this work, we introduce a method to perform such concept-based interventions on pretrained neural networks, which are not interpretable by design, only given a small validation set with concept labels. Furthermore, we formalise the notion of intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black boxes. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We focus on backbone architectures of varying complexity, from simple, fully connected neural nets to Stable Diffusion. We demonstrate that the proposed fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of our techniques, we apply them to deep chest X-ray classifiers and show that fine-tuned black boxes are more intervenable than CBMs. Lastly, we establish that our methods are still effective under vision-language-model-based concept annotations, alleviating the need for a human-annotated validation set.

Introduction

Interpretable machine learning has directed significant attention toward Concept Bottleneck Models (CBMs), which facilitate human intervention at the level of high-level attributes or concepts. This is particularly advantageous as it enables users to directly influence model predictions by editing these concept values. Nevertheless, a critical obstacle for CBMs is the necessity for concept knowledge and annotations at training time, which can be impractical or unattainable in many real-world scenarios.

Beyond Concept Bottleneck Models

A recent scholarly contribution addresses this challenge by presenting a technique to facilitate concept-based interventions in non-interpretable, pre-trained neural networks—all without requiring concept annotations during initial training. The work is a notable advancement, grounded in the idea of Intervenability as a new measure. It quantifies a model's amenability to concept-based interventions and serves as an effective tool to fine-tune black-box models to respond better to such interventions. A key premise is preserving the original model's architecture and learned representations, which is critical for knowledge transfer and maintaining performance across diverse tasks.

Methods and Contributions

The approach involves a three-step intervention procedure: firstly, training a probing function to map intermediate representations to concept values; secondly, editing these representations to echo the desired concept interventions; and thirdly, updating the final model output based on edited representations. Notably, this approach requires only a small, annotated validation set for probing purposes. By leveraging the formalized concept of Intervenability, the authors introduce a novel fine-tuning procedure that does not alter the model's architecture, indeed facilitating the adaptability of this strategy to diverse pre-trained neural networks.

The work reflects upon various fine-tuning paradigms and contrasting them with the proposed intervenability-driven method. These comparative studies cement the validity of the new approach, demonstrating improved intervention effectiveness and model calibration over common-sense baselines.

Empirical Evaluation

Extensive experiments on both synthetic and real-world datasets, such as chest X-ray classifiers, illustrate the practical implications of the proposed method. While CBMs demonstrate expected strength in scenarios where the data-generating process heavily depends on the concepts, the newly introduced fine-tuning strategy effectively rivals or even supersedes CBMs in more complex setups. This includes cases where concepts are not sufficient to fully capture the relationship between inputs and outputs.

Conclusion

This work represents a significant milestone in the field of interpretable machine learning, offering a compelling solution for enhancing the intervention capacities of opaque neural network models. The methods developed extend the practicality of intervenability measures to real-world applications, offering a mechanism to mediate between interpretability and performance while allowing the existing black-box models to benefit from human-expert interactions. This paper sets the stage for further exploration into optimal strategies for intervention and the integration of automated concept discovery, and its implications for the evaluation and refinement of large pre-trained models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Meaningfully debugging model mistakes using conceptual counterfactual explanations. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  66–88. PMLR, 2022. URL https://proceedings.mlr.press/v162/abid22a.html.
  2. Understanding intermediate layers using linear classifier probes, 2016. URL https://doi.org/10.48550/arXiv.1610.01644. arXiv:1610.01644.
  3. Belinkov, Y. Probing Classifiers: Promises, Shortcomings, and Advances. Computational Linguistics, 48(1):207–219, 2022. URL https://doi.org/10.1162/coli_a_00422.
  4. Breiman, L. Random forests. Machine Learning, 45(1):5–32, 2001. URL https://doi.org/10.1023/a:1010933404324.
  5. Brier, G. W. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1):1–3, 1950. URL https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
  6. Interactive concept bottleneck models. Proceedings of the AAAI Conference on Artificial Intelligence, 37(5):5948–5955, 2023. URL https://ojs.aaai.org/index.php/AAAI/article/view/25736.
  7. Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, 2020. URL https://doi.org/10.1038/s42256-020-00265-z.
  8. Human uncertainty in concept-based ai systems, 2023. URL https://doi.org/10.48550/arXiv.2303.12872. arXiv:2303.12872.
  9. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, pp.  233––240, New York, NY, USA, 2006. Association for Computing Machinery. URL https://doi.org/10.1145/1143844.1143874.
  10. Towards a rigorous science of interpretable machine learning, 2017. URL https://doi.org/10.48550/arXiv.1702.08608. arXiv:1702.08608.
  11. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177):1–81, 2019. URL http://jmlr.org/papers/v20/18-760.html.
  12. Towards automatic concept-based explanations. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32, pp.  9277––9286. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/77d2afcb31f6493e350fca61764efb9a-Paper.pdf.
  13. On calibration of modern neural networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.  1321–1330. PMLR, 2017.
  14. Addressing leakage in concept bottleneck models. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=tglniD_fn9.
  15. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, 2016. URL https://doi.org/10.1109/CVPR.2016.90.
  16. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pp.  590–597, 2019. URL https://ojs.aaai.org/index.php/AAAI/article/view/3834.
  17. Rethinking nearest neighbors for visual classification, 2021. URL https://doi.org/10.48550/arXiv.2112.08459. arXiv:2112.08459.
  18. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6(1):317, 2019. URL https://doi.org/10.1038/s41597-019-0322-0.
  19. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  2668–2677. PMLR, 2018. URL https://proceedings.mlr.press/v80/kim18d.html.
  20. Probabilistic concept bottleneck models. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  16521–16540. PMLR, 2023a. URL https://proceedings.mlr.press/v202/kim23g.html.
  21. Grounding counterfactual explanation of image classifiers to textual concept space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10942–10950, 2023b.
  22. Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, 2015. URL http://arxiv.org/abs/1412.6980.
  23. Concept bottleneck models. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  5338–5348, Virtual, 2020. PMLR. URL https://proceedings.mlr.press/v119/koh20a.html.
  24. Attribute and simile classifiers for face verification. In 2009 IEEE 12th International Conference on Computer Vision, pp.  365–372, Kyoto, Japan, 2009. IEEE. URL https://doi.org/10.1109/ICCV.2009.5459250.
  25. Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009. IEEE. URL https://doi.org/10.1109/CVPR.2009.5206594.
  26. Influence-directed explanations for deep convolutional networks. In 2018 IEEE International Test Conference (ITC). IEEE, 2018. URL https://doi.org/10.1109/test.2018.8624792.
  27. Promises and pitfalls of black-box concept learning models, 2021. URL https://doi.org/10.48550/arXiv.2106.13314. arXiv:2106.13314.
  28. Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis. Medical Image Analysis, 91:103042, 2024. URL https://www.sciencedirect.com/science/article/pii/S136184152300302X.
  29. GlanceNets: Interpretable, leak-proof concept-based models, 2022. arXiv:2205.15612.
  30. Contextual semantic interpretability. In Ishikawa, H., Liu, C., Pajdla, T., and Shi, J. (eds.), Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Revised Selected Papers, Part IV, volume 12625 of Lecture Notes in Computer Science, pp.  351–368. Springer, 2020. URL https://doi.org/10.1007/978-3-030-69538-5_22.
  31. Do concept bottleneck models learn as intended?, 2021. URL https://doi.org/10.48550/arXiv.2105.04289. arXiv:2105.04289.
  32. Molnar, C. Interpretable Machine Learning. 2 edition, 2022. URL https://christophm.github.io/interpretable-ml-book.
  33. Deep k-means: Jointly clustering with k-means and learning representations. Pattern Recognition Letters, 138:185–192, 2020. URL https://www.sciencedirect.com/science/article/pii/S0167865520302749.
  34. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, pp.  607–617, New York, NY, USA, 2020. Association for Computing Machinery. URL https://doi.org/10.1145/3351095.3372850.
  35. Label-free concept bottleneck models. In The 11th International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=FlCg47MNvBA.
  36. PyTorch: An imperative style, high-performance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32, Red Hook, NY, United States, 2019. Curran Associates, Inc. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.
  37. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(85):2825–2830, 2011. URL http://jmlr.org/papers/v12/pedregosa11a.html.
  38. Ruder, S. An overview of multi-task learning in deep neural networks, 2017. URL https://doi.org/10.48550/arXiv.1706.05098. arXiv:1706.05098.
  39. Concept bottleneck model with additional unsupervised concepts. IEEE Access, 10:41758–41765, 2022. URL https://doi.org/10.1109/ACCESS.2022.3167702.
  40. Learning from uncertain concepts via test time interventions. In Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022, 2022. URL https://openreview.net/forum?id=WVe3vok8Cc3.
  41. A closer look at the intervention procedure of concept bottleneck models. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  31504–31520. PMLR, 2023. URL https://proceedings.mlr.press/v202/shin23a.html.
  42. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014. URL http://jmlr.org/papers/v15/srivastava14a.html.
  43. Learning to intervene on concept bottlenecks, 2023. URL https://doi.org/10.48550/arXiv.2308.13453. arXiv:2308.13453.
  44. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31, 2017. URL https://doi.org/10.2139/ssrn.3063289.
  45. Caltech-UCSD Birds-200-2011, 2011. URL https://authors.library.caltech.edu/records/cvm3y-5hh21. Technical report. CNS-TR-2011-001. California Institute of Technology.
  46. Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9):2251–2265, 2019. URL https://doi.org/10.1109/tpami.2018.2857768.
  47. On completeness-aware concept-based explanations in deep neural networks. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  20554–20565. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/ecb287ff763c169694f682af52c1f309-Paper.pdf.
  48. Post-hoc concept bottleneck models. In The 11th International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=nA5AZ8CEyow.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ričards Marcinkevičs (10 papers)
  2. Sonia Laguna (10 papers)
  3. Moritz Vandenhirtz (13 papers)
  4. Julia E. Vogt (44 papers)
Citations (8)