Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis (2307.01981v1)

Published 5 Jul 2023 in eess.IV, cs.CV, and cs.LG

Abstract: Zero-shot medical image classification is a critical process in real-world scenarios where we have limited access to all possible diseases or large-scale annotated data. It involves computing similarity scores between a query medical image and possible disease categories to determine the diagnostic result. Recent advances in pretrained vision-LLMs (VLMs) such as CLIP have shown great performance for zero-shot natural image recognition and exhibit benefits in medical applications. However, an explainable zero-shot medical image recognition framework with promising performance is yet under development. In this paper, we propose a novel CLIP-based zero-shot medical image classification framework supplemented with ChatGPT for explainable diagnosis, mimicking the diagnostic process performed by human experts. The key idea is to query LLMs with category names to automatically generate additional cues and knowledge, such as disease symptoms or descriptions other than a single category name, to help provide more accurate and explainable diagnosis in CLIP. We further design specific prompts to enhance the quality of generated texts by ChatGPT that describe visual medical features. Extensive results on one private dataset and four public datasets along with detailed analysis demonstrate the effectiveness and explainability of our training-free zero-shot diagnosis pipeline, corroborating the great potential of VLMs and LLMs for medical applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  2. Openflamingo, 2023.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  4. Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  397–406, 2021.
  5. Uniter: Learning universal image-text representations. 2019.
  6. Uniter: Universal image-text representation learning. In European conference on computer vision, pp.  104–120. Springer, 2020.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  8. Multiple meta-model quantifying for medical visual question answering. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp.  64–74. Springer, 2021.
  9. Does clip benefit visual question answering in the medical domain as much as it does in the general domain? arXiv preprint arXiv:2112.13906, 2021.
  10. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
  11. Two public chest x-ray datasets for computer-aided screening of pulmonary diseases. Quantitative imaging in medicine and surgery, 4(6):475, 2014.
  12. Explainable and interpretable diabetic retinopathy classification based on neural-symbolic learning. arXiv preprint arXiv:2204.00624, 2022.
  13. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pp. 4904–4916. PMLR, 2021.
  14. Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems, 34:1022–1035, 2021.
  15. Identifying medical diagnoses and treatable diseases by image-based deep learning. cell, 172(5):1122–1131, 2018.
  16. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. arXiv preprint arXiv:2201.12086, 2022.
  17. Oscar: Object-semantics aligned pre-training for vision-language tasks. In European Conference on Computer Vision, pp.  121–137. Springer, 2020.
  18. Contrastive pre-training and representation distillation for medical visual question answering based on radiology images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, pp.  210–220. Springer, 2021.
  19. Automatic diabetic retinopathy grading via self-knowledge distillation. Electronics, 9(9):1337, 2020.
  20. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. arXiv preprint arXiv:2106.04489, 2021.
  21. Medical image classification using generalized zero shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3344–3353, 2021.
  22. Self-supervised generalized zero shot learning for medical image classification using novel interpretable saliency maps. IEEE Transactions on Medical Imaging, 41(9):2443–2456, 2022.
  23. Visual classification via description from large language models. arXiv preprint arXiv:2210.07183, 2022.
  24. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
  25. Indian diabetic retinopathy image dataset (idrid): a database for diabetic retinopathy screening research. Data, 3(3):25, 2018.
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  27. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763. PMLR, 2021.
  28. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  2556–2565, 2018.
  29. How much can clip benefit vision-and-language tasks? arXiv preprint arXiv:2107.06383, 2021.
  30. An incremental learning approach to automatically recognize pulmonary diseases from the multi-vendor chest radiographs. Computers in Biology and Medicine, 134:104435, 2021.
  31. Detection of pneumonia using convolutional neural networks and deep learning. Biocybernetics and Biomedical Engineering, 42(3):1012–1022, 2022.
  32. Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163, 2022.
  33. Coarse-to-fine classification for diabetic retinopathy grading using convolutional neural network. Artificial Intelligence in Medicine, 108:101936, 2020.
  34. Multimodal c4: An open, billion-scale corpus of images interleaved with text. arXiv preprint arXiv:2304.06939, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiaxiang Liu (39 papers)
  2. Tianxiang Hu (13 papers)
  3. Yan Zhang (954 papers)
  4. Xiaotang Gai (5 papers)
  5. Yang Feng (230 papers)
  6. Zuozhu Liu (78 papers)
Citations (25)