VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge (2408.02865v1)
Abstract: The need for improved diagnostic methods in ophthalmology is acute, especially in the less developed regions with limited access to specialists and advanced equipment. Therefore, we introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge. VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs, and further refined using our proposed MMFundus dataset, which includes 296,379 high-quality fundus image-text pairs and 889,137 simulated doctor-patient dialogue instances. Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT-4V and Gemini Pro. It also demonstrates diagnostic capabilities comparable to junior ophthalmologists. VisionUnite performs well in various clinical scenarios including open-ended multi-disease diagnosis, clinical explanation, and patient interaction, making it a highly versatile tool for initial ophthalmic disease screening. VisionUnite can also serve as an educational aid for junior ophthalmologists, accelerating their acquisition of knowledge regarding both common and rare ophthalmic conditions. VisionUnite represents a significant advancement in ophthalmology, with broad implications for diagnostics, medical education, and understanding of disease mechanisms.
- De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. \JournalTitleNature medicine 24, 1342–1350 (2018).
- Cen, L.-P. et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. \JournalTitleNature communications 12, 4828 (2021).
- Dai, L. et al. A deep learning system for detecting diabetic retinopathy across the disease spectrum. \JournalTitleNature communications 12, 3242 (2021).
- Learning from everyday images enables expert-like diagnosis of retinal diseases. \JournalTitleCell 172, 893–895 (2018).
- Holmberg, O. G. et al. Self-supervised retinal thickness prediction enables deep learning from unlabelled data to boost classification of diabetic retinopathy. \JournalTitleNature Machine Intelligence 2, 719–726 (2020).
- Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. \JournalTitlejama 316, 2402–2410 (2016).
- Kern, D. Living with two or more eye diseases (2015). https://www.aao.org/eye-health/news/living-with-two-more-eye-disease [Accessed: (May. 6, 2024)].
- Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. \JournalTitleNature 622, 156–163 (2023).
- Singhal, K. et al. Large language models encode clinical knowledge. \JournalTitleNature 620, 172–180 (2023).
- Moor, M. et al. Foundation models for generalist medical artificial intelligence. \JournalTitleNature 616, 259–265 (2023).
- Touvron, H. et al. Llama: Open and efficient foundation language models. \JournalTitlearXiv preprint arXiv:2302.13971 (2023).
- Chen, X. et al. Microsoft coco captions: Data collection and evaluation server. \JournalTitlearXiv preprint arXiv:1504.00325 (2015).
- Pmc open access subset [internet] bethesda (md) national library of medicine (2003). https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/ [Accessed: (May. 6, 2024)].
- Lin, W. et al. Pmc-clip: Contrastive language-image pre-training using biomedical documents. \JournalTitlearXiv preprint arXiv:2303.07240 (2023).
- ASRS. Retina image bank dataset (2012). https://imagebank.asrs.org [Accessed: (May. 6, 2024)].
- Ocular disease recognition (2019). https://www.kaggle.com/datasets/andrewmvd/ocular-disease-recognition-odir5k [Accessed: (May. 6, 2024)].
- Aptos 2019 blindness detection (2019).
- Liu, R. et al. Deepdrid: Diabetic retinopathy—grading and image quality estimation challenge. \JournalTitlePatterns 3 (2022).
- Orlando, J. I. et al. Refuge challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. \JournalTitleMedical image analysis 59, 101570 (2020).
- OpenAI. Gpt-4v(ision) system card (2023).
- Team, G. et al. Gemini: A family of highly capable multimodal models. \JournalTitlearXiv preprint arXiv:2312.11805 (2023).
- Wilson, E. B. Probable inference, the law of succession, and statistical inference. \JournalTitleJournal of the American Statistical Association 22, 209–212 (1927).
- Bootstrap confidence intervals. \JournalTitleStatistical science 11, 189–228 (1996).
- Li, C. et al. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. \JournalTitleAdvances in Neural Information Processing Systems 36 (2024).
- Fang, Y. et al. Eva-02: A visual representation for neon genesis. \JournalTitlearXiv preprint arXiv:2303.11331 (2023).
- Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. \JournalTitleICLR (2021).
- Gaussian error linear units (gelus). \JournalTitlearXiv preprint arXiv:1606.08415 (2016).
- Language modeling with gated convolutional networks. In International conference on machine learning, 933–941 (PMLR, 2017).
- Chowdhery, A. et al. Palm: Scaling language modeling with pathways. \JournalTitlearXiv preprint arXiv:2204.02311 (2022).
- Zhang, R. et al. Llama-adapter: Efficient fine-tuning of large language models with zero-initialized attention. In The Twelfth International Conference on Learning Representations (2024).
- Team, I. Internlm: A multilingual language model with progressively enhanced capabilities (2023).
- Achiam, J. et al. Gpt-4 technical report. \JournalTitlearXiv preprint arXiv:2303.08774 (2023).
- Budai, A. et al. Robust vessel segmentation in fundus images. \JournalTitleInternational journal of biomedical imaging 2013 (2013).
- Niemeijer, M. et al. Automated measurement of the arteriolar-to-venular width ratio in digital color fundus photographs. \JournalTitleIEEE Transactions on medical imaging 30, 1941–1950 (2011).
- Zhang, J. et al. Robust retinal vessel segmentation via locally adaptive derivative frames in orientation scores. \JournalTitleIEEE transactions on medical imaging 35, 2631–2644 (2016).
- Automated separation of binary overlapping trees in low-contrast color retinal images. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2013: 16th International Conference, Nagoya, Japan, September 22-26, 2013, Proceedings, Part II 16, 436–443 (Springer, 2013).
- Bajwa, M. N. et al. G1020: A benchmark retinal fundus image dataset for computer-aided glaucoma detection. In 2020 International Joint Conference on Neural Networks (IJCNN), 1–7 (IEEE, 2020).
- Wu, J. et al. Gamma challenge: glaucoma grading from multi-modality images. \JournalTitleMedical Image Analysis 90, 102938 (2023).
- Zhang, Z. et al. Origa-light: An online retinal fundus image database for glaucoma analysis and research. In 2010 Annual international conference of the IEEE engineering in medicine and biology, 3065–3068 (IEEE, 2010).
- Fu, H. et al. Palm: Pathologic myopia challenge, DOI: 10.21227/55pk-8z03 (2019).
- Pachade, S. et al. Retinal fundus multi-disease image dataset (rfmid): A dataset for multi-disease detection research. \JournalTitleData 6, 14 (2021).
- Panchal, S. et al. Retinal fundus multi-disease image dataset (rfmid) 2.0: A dataset of frequently and rarely identified diseases. \JournalTitleData 8, 29 (2023).
- Diabetic retinopathy detection (2015).
- Porwal, P. et al. Indian diabetic retinopathy image dataset (idrid): a database for diabetic retinopathy screening research. \JournalTitleData 3, 25 (2018).
- Fang, H. et al. Adam challenge: Detecting age-related macular degeneration from fundus images. \JournalTitleIEEE Transactions on Medical Imaging 41, 2828–2847 (2022).
- Diaz-Pinto, A. et al. Cnns for automatic glaucoma assessment using fundus images: an extensive validation. \JournalTitleBiomedical engineering online 18, 1–19 (2019).
- Abrà moff, M. D. et al. Automated analysis of retinal images for detection of referable diabetic retinopathy. \JournalTitleJAMA ophthalmology 131, 351–357 (2013).
- Huang, J.-H. et al. Deepopht: medical report generation for retinal images via deep models and visual explanation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2442–2452 (2021).
- Attention based glaucoma detection: A large-scale database and cnn model. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10571–10580 (2019).
- BenÃtez, V. E. C. et al. Dataset from fundus images for the study of diabetic retinopathy. \JournalTitleData in brief 36, 107068 (2021).
- Kovalyk, O. et al. Papila: Dataset with fundus images and clinical data of both eyes of the same patient for glaucoma assessment. \JournalTitleScientific Data 9, 291 (2022).
- Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. \JournalTitleIEEE Transactions on Medical imaging 19, 203–210 (2000).
- Jin, K. et al. Fives: A fundus image dataset for artificial intelligence based vessel segmentation. \JournalTitleScientific Data 9, 475 (2022).
- Hassan, T. et al. A composite retinal fundus and oct dataset to grade macular and glaucomatous disorders. In 2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2), 1–6 (IEEE, 2022).
- Decenciere, E. et al. Teleophta: Machine learning and image processing methods for teleophthalmology. \JournalTitleIrbm 34, 196–203 (2013).
- Nakayama, L. F. et al. A brazilian multilabel ophthalmological dataset (brset) (2023).
- Multi-label retinal disease classification using transformers. \JournalTitleIEEE Journal of Biomedical and Health Informatics (2022).
- Li, T. et al. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. \JournalTitleInformation Sciences 501, 511–522 (2019).
- Lin, L. et al. The sustech-sysu dataset for automated exudate detection and diabetic retinopathy grading. \JournalTitleScientific Data 7, 409 (2020).
- cataract dataset (2019). https://www.kaggle.com/datasets/jr2ngb/cataractdataset [Accessed: (May. 6, 2024)].
- Applying artificial intelligence to disease staging: Deep learning for improved staging of diabetic retinopathy. \JournalTitlePloS one 12, e0179790 (2017).
- Advancing bag-of-visual-words representations for lesion classification in retinal images. \JournalTitlePloS one 9, e96814 (2014).
- Kim, U. Machine learn for glaucoma, DOI: 10.7910/DVN/1YRRAC (2018).
- Batista, F. J. F. et al. Rim-one dl: A unified retinal image database for assessing glaucoma using deep learning. \JournalTitleImage Analysis & Stereology 39, 161–167 (2020).
- Kumar, J. H. et al. Chákṣu: A glaucoma specific fundus image database. \JournalTitleScientific data 10, 70 (2023).
- Kauppi, T. et al. The diaretdb1 diabetic retinopathy database and evaluation protocol. In BMVC, 10 (Citeseer, 2007).
- Wei, Q. et al. Laser scar detection in fundus images using convolutional neural networks. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part IV 14, 191–206 (Springer, 2019).
- Nandi, A. Glaucoma and non-glaucoma classification using ml/dl and ensemble approaches using image feature extraction using hog (2022). https://github.com/officialarijit/Glaucoma-classification-ML-DL [Accessed: (May. 6, 2024)].
- Augemnted ocular diseases (2021). https://www.kaggle.com/datasets/nurmukhammed7/augemnted-ocular-diseases [Accessed: (May. 6, 2024)].
- Devarasetti, N. C. Derbi hackathon retinal fundus image dataset (2021). https://www.kaggle.com/datasets/nikkich9/derbi-hackathon-retinal-fundus-image-dataset [Accessed: (May. 6, 2024)].
- Vietai advance course - retinal disease detection (2020). https://www.kaggle.com/competitions/vietai-advance-retinal-disease-detection-2020/data [Accessed: (May. 6, 2024)].
- Alam, S. S. et al. Benchmarking deep learning frameworks for automated diagnosis of ocular toxoplasmosis: A comprehensive approach to classification and segmentation. \JournalTitlearXiv preprint arXiv:2305.10975 (2023).
- Kim, U. Machine learning for pseudopapilledema, DOI: 10.17605/OSF.IO/2W5CE (2018).
- Islam, M. T. et al. Deep learning-based glaucoma detection with cropped optic cup and disc and blood vessel segmentation. \JournalTitleIEEE Access 10, 2828–2841 (2021).
- Hoyt, W. The william f. hoyt neuro-ophthalmology collection. https://novel.utah.edu/Hoyt/ [Accessed: (May. 6, 2024)].
- Accuracy assessment of intra-and intervisit fundus image registration for diabetic retinopathy screening. \JournalTitleInvestigative ophthalmology & visual science 56, 1805–1812 (2015).
- Binu, G. M. Retinal occlusion dataset (2022). https://www.kaggle.com/datasets/gracemariabinu/retinal-occlusion-dataset [Accessed: (May. 6, 2024)].
- Darabi, P. K. Diagnosis of diabetic retinopathy (2022). https://www.kaggle.com/datasets/pkdarabi/diagnosis-of-diabetic-retinopathy [Accessed: (May. 6, 2024)].
- De Vente, C. et al. Airogs: Artificial intelligence for robust glaucoma screening challenge. \JournalTitleIEEE transactions on medical imaging (2023).
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.