Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models (2310.03182v1)
Abstract: Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new domains (e.g., patients with different ages). Second, these black-box models lack interpretability. When making diagnostic predictions, it is important to understand why a model makes a decision for trustworthy and safety considerations. In this paper, to address these two limitations, we propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-LLM. We systematically evaluate our method on eight medical image classification datasets to verify its effectiveness. On challenging datasets with strong confounding factors, our method can mitigate spurious correlations thus substantially outperform standard visual encoders and other baselines. Finally, we show how classification with a small number of concepts brings a level of interpretability for understanding model decisions through case studies in real medical data.
- Big self-supervised models advance medical image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3478–3488, 2021.
- A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nature Machine Intelligence, 3(12):1061–1070, 2021.
- Making the most of text semantics to improve biomedical vision–language processing. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVI, pages 1–21. Springer, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Interactive concept bottleneck models. arXiv preprint arXiv:2212.07430, 2022.
- Can ai help in screening viral and covid-19 pneumonia? Ieee Access, 8:132665–132676, 2020.
- Ct imaging features of 2019 novel coronavirus (2019-ncov). Radiology, 295(1):202–207, 2020.
- Global and local interpretability for cardiac mri classification. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part IV 22, pages 656–664. Springer, 2019.
- Covid-19 image data collection. arXiv preprint arXiv:2003.11597, 2020.
- Marleen De Bruijne. Machine learning approaches in medical image analysis: From detection to diagnosis, 2016.
- Design and development of a multimodal biomedical information retrieval system. Journal of Computing Science and Engineering, 6(2):168–177, 2012.
- Artificial intelligence for the detection of covid-19 pneumonia on chest ct using multinational datasets. Nature communications, 11(1):4080, 2020.
- " nothing abnormal": Disambiguating medical reports via contrastive knowledge infusion. arXiv preprint arXiv:2305.08300, 2023.
- Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
- Identifying medical diagnoses and treatable diseases by image-based deep learning. cell, 172(5):1122–1131, 2018.
- Concept bottleneck models. In International Conference on Machine Learning, pages 5338–5348. PMLR, 2020.
- WILDS: A benchmark of in-the-wild distribution shifts. In ICML, volume 139 of Proceedings of Machine Learning Research, pages 5637–5664. PMLR, 2021.
- Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022.
- A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
- Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15(2018):11, 2018.
- Semi-supervised multi-label classification with 3d cbam resnet for tuberculosis cavern report. In CLEF2022 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org< http://ceurws. org>, Bologna, Italy, 2022.
- Visual classification via description from large language models. arXiv preprint arXiv:2210.07183, 2022.
- Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375, 2023.
- Label-free concept bottleneck models. arXiv preprint arXiv:2304.06129, 2023.
- OpenAI. Gpt-4 technical report. arXiv, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Learning to learn single domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12556–12565, 2020.
- Learning transferable visual models from natural language supervision. In ICML, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 2021a.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021b.
- Cynthia Rudin. Why black box machine learning should be avoided for high-stakes decisions, in brief. Nature Reviews Methods Primers, 2(1):81, 2022.
- Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In ICLR, 2020a.
- An investigation of why overparameterization exacerbates spurious correlations. In International Conference on Machine Learning, pages 8346–8356. PMLR, 2020b.
- Coronavirus disease 2019 (covid-19): a systematic review of imaging findings in 919 patients. Ajr Am J Roentgenol, 215(1):87–93, 2020.
- Public covid-19 x-ray datasets and their impact on model bias–a systematic review of a significant problem. Medical image analysis, 74:102225, 2021.
- Transformers in medical imaging: A survey. Medical Image Analysis, page 102802, 2023.
- A review on novel coronavirus (covid-19): symptoms, transmission and diagnosis tests. Research in Infectious Diseases and Tropical Medicine, 2(1):1–8, 2020.
- An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization. Medical image analysis, 68:101908, 2021.
- Gradient matching for domain generalization. arXiv preprint arXiv:2104.09937, 2021.
- Explainable deep learning models in medical image analysis. Journal of Imaging, 6(6):52, 2020.
- Saunet: Shape attentive u-net for interpretable medical image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part IV 23, pages 797–806. Springer, 2020.
- Interpretable deep learning systems for multi-class segmentation and classification of non-melanoma skin cancer. Medical Image Analysis, 68:101915, 2021.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Explainable artificial intelligence (xai) in deep learning-based medical image analysis. Medical Image Analysis, page 102470, 2022.
- Generalizing to unseen domains via adversarial data augmentation. arXiv preprint arXiv:1805.12018, 2018.
- Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2097–2106, 2017.
- Metateacher: Coordinating multi-model domain adaptation for medical image classification. Advances in Neural Information Processing Systems, 35:20823–20837, 2022a.
- Medclip: Contrastive learning from unpaired medical images and text. In EMNLP, pages 3876–3887. Association for Computational Linguistics, 2022b.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
- Weakly supervised contrastive learning for chest x-ray report generation. arXiv preprint arXiv:2109.12242, 2021.
- Radbert: Adapting transformer-based language models to radiology. Radiology: Artificial Intelligence, 4(4):e210258, 2022.
- Learning concise and descriptive attributes for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3090–3100, 2023.
- Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. arXiv preprint arXiv:2211.11158, 2022.
- Improving out-of-distribution robustness via selective augmentation. In ICML, volume 162 of Proceedings of Machine Learning Research, pages 25407–25437. PMLR, 2022.
- Post-hoc concept bottleneck models. arXiv preprint arXiv:2205.15480, 2022.
- Concept embedding models: Beyond the accuracy-explainability trade-off. In Advances in Neural Information Processing Systems, volume 35, pages 21400–21413. Curran Associates, Inc., 2022.
- Confounding variables can degrade generalization performance of radiological deep learning models. arXiv preprint arXiv:1807.00431, 2018.
- Coping with label shift via distributionally robust optimisation. In ICLR, 2021.
- Mdnet: A semantically and visually interpretable medical image diagnosis network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6428–6436, 2017.
- Maximum-entropy adversarial data augmentation for improved generalization and robustness. arXiv preprint arXiv:2010.08001, 2020.
- Examining and combating spurious features under distribution shift. In ICML, 2021.
- Domain generalization with optimal transport and metric learning. arXiv preprint arXiv:2007.10573, 2020.