Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Envisioning MedCLIP: A Deep Dive into Explainability for Medical Vision-Language Models (2403.18996v1)

Published 27 Mar 2024 in cs.CV

Abstract: Explaining Deep Learning models is becoming increasingly important in the face of daily emerging multimodal models, particularly in safety-critical domains like medical imaging. However, the lack of detailed investigations into the performance of explainability methods on these models is widening the gap between their development and safe deployment. In this work, we analyze the performance of various explainable AI methods on a vision-LLM, MedCLIP, to demystify its inner workings. We also provide a simple methodology to overcome the shortcomings of these methods. Our work offers a different new perspective on the explainability of a recent well-known VLM in the medical domain and our assessment method is generalizable to other current and possible future VLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Warren J. von Eschenbach, “Transparency and the black box problem: Why we do not trust ai,” Philosophy & Technology, vol. 34, no. 4, pp. 1607–1622, 2021.
  2. Alejandro Barredo et al. Arrieta, “Explainable ai: Concepts, taxonomies, opportunities, and challenges for responsible ai,” Information Fusion, vol. 58, pp. 82–115, 2020.
  3. Karen et al. Simonyan, “Deep inside convolutional networks: Visualizing image classification models and saliency maps,” arXiv:1312.6034, 2013.
  4. Sebastian Bach et al., “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PloS one, vol. 10, no. 7, pp. e0130140, 2015.
  5. Bolei Zhou et al., “Learning deep features for discriminative localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
  6. Tadas Baltrušaitis et al., “Multimodal machine learning: A survey and taxonomy,” IEEE TPAMI, vol. 41, no. 2, pp. 423–443, 2018.
  7. Khaled Bayoudh et al., “A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets,” The Visual Computer, pp. 1–32, 2021.
  8. Ross Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014.
  9. Chao Jia et al., “Scaling up visual and vision-language representation learning with noisy text supervision,” in International conference on machine learning. PMLR, 2021.
  10. Stanislaw Antol et al., “Vqa: Visual question answering,” in Proceedings of the IEEE international conference on computer vision, 2015.
  11. Jingyi Zhang et al., “Vision-language models for vision tasks: A survey,” arXiv preprint arXiv:2304.00685, 2023.
  12. Yuhao Zhang et al., “Contrastive learning of medical visual representations from paired images and text,” in Machine Learning for Healthcare Conference. PMLR, 2022.
  13. Shih-Cheng Huang et al., “Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition,” in ICCV, 2021.
  14. Zifeng Wang et al., “Medclip: Contrastive learning from unpaired medical images and text,” arXiv preprint arXiv:2210.10163, 2022.
  15. Pierre Chambon et al., “Roentgen: vision-language foundation model for chest x-ray generation,” arXiv preprint arXiv:2211.12737, 2022.
  16. Nida Nasir et al., “Multi-modal image classification of covid-19 cases using computed tomography and x-ray scans,” Intelligent Systems with Applications, vol. 17, pp. 200160, 2023.
  17. Alec Radford et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021.
  18. Bas HM van der Velden et al., “Explainable artificial intelligence (xai) in deep learning-based medical image analysis,” Medical Image Analysis, vol. 79, pp. 102470, 2022.
  19. Jost Tobias Springenberg et al., “Striving for simplicity: The all convolutional net,” arXiv preprint arXiv:1412.6806, 2014.
  20. “Visualizing and understanding convolutional networks,” in ECCV 2014. 2014, vol. 13, Springer.
  21. “Axiomatic attribution for deep networks,” in International conference on machine learning. PMLR, 2017.
  22. Ramprasaath R. Selvaraju et al., “Grad-cam: Visual explanations via gradient-based localization,” in ICCV, 2017.
  23. Ze Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021.
  24. Jia Deng et al., “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009.
  25. Alistair EW Johnson et al., “Mimic-iii: A freely accessible critical care database,” Scientific Data, vol. 3, no. 1, pp. 1–9, 2016.
  26. Alistair EW Johnson et al., “Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports,” Scientific data, vol. 6, no. 1, pp. 317, 2019.
  27. Adam Paszke et al., “Pytorch: High-performance deep learning library,” https://pytorch.org, 2019.
  28. Narine Kokhlikyan et al., “Captum: A unified model interpretability library for pytorch,” 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Anees Ur Rehman Hashmi (8 papers)
  2. Dwarikanath Mahapatra (51 papers)
  3. Mohammad Yaqub (77 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com