Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue (2306.12174v2)

Published 21 Jun 2023 in cs.CV

Abstract: Large multimodal LLMs (LMMs) have achieved significant success in general domains. However, due to the significant differences between medical images and text and general web content, the performance of LMMs in medical scenarios is limited. In ophthalmology, clinical diagnosis relies on multiple modalities of medical images, but unfortunately, multimodal ophthalmic LLMs have not been explored to date. In this paper, we study and construct an ophthalmic large multimodal model. Firstly, we use fundus images as an entry point to build a disease assessment and diagnosis pipeline to achieve common ophthalmic disease diagnosis and lesion segmentation. Then, we establish a new ophthalmic multimodal instruction-following and dialogue fine-tuning dataset based on disease-related knowledge data and publicly available real-world medical dialogue. We introduce visual ability into the LLM to complete the ophthalmic large language and vision assistant (OphGLM). Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be made publicly available at https://github.com/ML-AILab/OphGLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Vision–language model for visual question answering in medical imagery. Bioengineering, 10(3):380, 2023.
  2. Meddialog: a large-scale medical dialogue dataset. arXiv preprint arXiv:2004.03329, 2020.
  3. Pubmedclip: How much does clip benefit visual question answering in the medical domain? In Findings of the Association for Computational Linguistics: EACL 2023, pages 1151–1163, 2023.
  4. Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
  5. Pathvqa: 30000+ questions for medical visual question answering. arXiv preprint arXiv:2003.10286, 2020.
  6. Benefits, limits, and risks of gpt-4 as an ai chatbot for medicine. New England Journal of Medicine, 388(13):1233–1239, 2023. PMID: 36988602.
  7. The ai revolution in medicine: Gpt-4 and beyond, volume 2. The name of the publisher, 2023.
  8. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890, 2023.
  9. Self-supervised vision-language pretraining for medical visual question answering. arXiv preprint arXiv:2211.13594, 2022.
  10. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Information Sciences, 501:511–522, 2019.
  11. Visual instruction tuning, 2023.
  12. Q2atransformer: Improving medical vqa via an answer querying decoder. In International Conference on Information Processing in Medical Imaging, pages 445–456. Springer, 2023.
  13. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), 2022.
  14. Capabilities of gpt-4 on medical challenge problems, 2023.
  15. OpenAI. Gpt-4 technical report, 2023.
  16. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  17. Visual med-alpaca: A parameter-efficient biomedical llm with visual capabilities, 2023.
  18. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  19. Open-ended medical visual question answering through prefix tuning of language models. arXiv preprint arXiv:2303.05977, 2023.
  20. Biomedlm: a domain-specific large language model for biomedical text. MosaicML. Accessed: Dec, 23:3, 2022.
  21. Huatuo: Tuning llama model with chinese medical knowledge. arXiv preprint arXiv:2304.06975, 2023.
  22. Pmc-llama: Further finetuning llama on medical papers. arXiv preprint arXiv:2304.14454, 2023.
  23. Doctorglm: Fine-tuning your chinese doctor is not a herculean task. arXiv preprint arXiv:2304.01097, 2023.
  24. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070, 2023.
  25. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022.
  26. Large-scale domain-specific pretraining for biomedical vision-language processing. arXiv preprint arXiv:2303.00915, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Weihao Gao (30 papers)
  2. Zhuo Deng (16 papers)
  3. Zhiyuan Niu (3 papers)
  4. Fuju Rong (2 papers)
  5. Chucheng Chen (2 papers)
  6. Zheng Gong (69 papers)
  7. Wenze Zhang (3 papers)
  8. Daimin Xiao (1 paper)
  9. Fang Li (142 papers)
  10. Zhenjie Cao (2 papers)
  11. Zhaoyi Ma (1 paper)
  12. Wenbin Wei (4 papers)
  13. Lan Ma (31 papers)
Citations (29)