Can language-guided unsupervised adaptation improve medical image classification using unpaired images and texts?
Abstract: In medical image classification, supervised learning is challenging due to the scarcity of labeled medical images. To address this, we leverage the visual-textual alignment within Vision-LLMs (VLMs) to enable unsupervised learning of a medical image classifier. In this work, we propose \underline{Med}ical \underline{Un}supervised \underline{A}daptation (\texttt{MedUnA}) of VLMs, where the LLM-generated descriptions for each class are encoded into text embeddings and matched with class labels via a cross-modal adapter. This adapter attaches to a visual encoder of \texttt{MedCLIP} and aligns the visual embeddings through unsupervised learning, driven by a contrastive entropy-based loss and prompt tuning. Thereby, improving performance in scenarios where textual information is more abundant than labeled images, particularly in the healthcare domain. Unlike traditional VLMs, \texttt{MedUnA} uses \textbf{unpaired images and text} for learning representations and enhances the potential of VLMs beyond traditional constraints. We evaluate the performance on three chest X-ray datasets and two multi-class datasets (diabetic retinopathy and skin lesions), showing significant accuracy gains over the zero-shot baseline. Our code is available at https://github.com/rumaima/meduna.
- Aaditya. Llama3-openbiollm-8b, 2024.
- Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization. Advances in Neural Information Processing Systems, 36, 2024.
- Synthetic boost: Leveraging synthetic data for enhanced vision-language segmentation in echocardiography. In International Workshop on Advances in Simplifying Medical Ultrasound, pages 89–99. Springer, 2023.
- Self-supervised learning for data scarcity in a fatigue damage prognostic problem. Engineering Applications of Artificial Intelligence, 120:105837, 2023.
- Novel transfer learning approach for medical imaging with limited labeled data. Cancers, 13(7):1590, 2021.
- Towards unifying medical vision-and-language pre-training via soft prompts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23403–23413, 2023.
- Adapting large language models via reading comprehension. In The Twelfth International Conference on Learning Representations.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.
- Diverse data augmentation with diffusions for effective test-time prompt tuning, 2023.
- Semi-supervised learning by entropy minimization. Advances in neural information processing systems, 17, 2004.
- Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
- Multi-Label Generalized Zero Shot Learning for the Classification of Disease in Chest Radiographs. In Proceedings of the 6th Machine Learning for Healthcare Conference, pages 461–477. PMLR, Oct. 2021. ISSN: 2640-3498.
- Are natural domain foundation models useful for medical image classification? In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7634–7643, 2024.
- International Skin Imaging Collaboration. International skin imaging collaboration. https://www.isic-archive.com/. Accessed: 2024-07-01.
- Trends and challenges of real-time learning in large language models: A critical review. arXiv preprint arXiv:2404.18311, 2024.
- Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172(5):1122–1131, 2018.
- Q: How to specialize large vision-language models to data-scarce vqa tasks? a: Self-train on unlabeled images! In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15005–15015, 2023.
- Transfer learning for medical image classification: a literature review. BMC medical imaging, 22(1):69, 2022.
- Biomistral: A collection of open-source pretrained large language models for medical domains. arXiv e-prints, pages arXiv–2402, 2024.
- From scarcity to efficiency: Improving clip training via visual-enriched captions. arXiv preprint arXiv:2310.07699, 2023.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
- Improving medical vision-language contrastive pretraining with semantics-aware triage. IEEE Transactions on Medical Imaging, 2023.
- Borrowing knowledge from pre-trained language model: A new data-efficient visual learning paradigm. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18786–18797, 2023.
- LaFTer: Label-free tuning of zero-shot classifier using language and unlabeled image collections. Advances in Neural Information Processing Systems, 36, 2024.
- National Library of Medicine. Montgomery county cxr set. https://data.lhncbc.nlm.nih.gov/public/Tuberculosis-Chest-X-ray-Datasets/Montgomery-County-CXR-Set/MontgomerySet/index.html. Accessed: 2024-07-01.
- National Library of Medicine. Shenzhen hospital cxr set. https://data.lhncbc.nlm.nih.gov/public/Tuberculosis-Chest-X-ray-Datasets/Shenzhen-Hospital-CXR-Set/index.html. Accessed: 2024-07-01.
- R OpenAI. Gpt-4 technical report. arxiv 2303.08774. View in Article, 2(5), 2023.
- Indian diabetic retinopathy image dataset (idrid), 2018.
- Exploring transfer learning in medical image segmentation using vision-language models. In Medical Imaging with Deep Learning.
- The entropy enigma: Success and failure of entropy minimization. In Forty-first International Conference on Machine Learning.
- Freeze the backbones: A parameter-efficient contrastive approach to robust medical vision-language pre-training. arXiv preprint arXiv:2401.01179, 2024.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems, 35:14274–14289, 2022.
- On large visual language models for medical imaging analysis: An empirical study. arXiv preprint arXiv:2402.14162, 2024.
- Medclip: Contrastive learning from unpaired medical images and text. In 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, 2022.
- Unidcp: Unifying multiple medical vision-language tasks via dynamic cross-modal learnable prompts. IEEE Transactions on Multimedia, 2024.
- Multi-task paired masking with alignment modeling for medical vision-language pre-training. IEEE Transactions on Multimedia, 2023.
- Biomedclip: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. arXiv preprint arXiv:2303.00915, 2023.
- Drml: Diagnosing and rectifying vision models using language. In NeurIPS ML Safety Workshop, 2022.
- Clip in medical imaging: A comprehensive survey. arXiv preprint arXiv:2312.07353, 2023.
- Semi-supervised domain generalization with stochastic stylematch. International Journal of Computer Vision, 131(9):2377–2387, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.