Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Evaluating Large Language Models for Multimodal Simulated Ophthalmic Decision-Making in Diabetic Retinopathy and Glaucoma Screening (2507.01278v1)

Published 2 Jul 2025 in cs.CL

Abstract: LLMs can simulate clinical reasoning based on natural language prompts, but their utility in ophthalmology is largely unexplored. This study evaluated GPT-4's ability to interpret structured textual descriptions of retinal fundus photographs and simulate clinical decisions for diabetic retinopathy (DR) and glaucoma screening, including the impact of adding real or synthetic clinical metadata. We conducted a retrospective diagnostic validation study using 300 annotated fundus images. GPT-4 received structured prompts describing each image, with or without patient metadata. The model was tasked with assigning an ICDR severity score, recommending DR referral, and estimating the cup-to-disc ratio for glaucoma referral. Performance was evaluated using accuracy, macro and weighted F1 scores, and Cohen's kappa. McNemar's test and change rate analysis were used to assess the influence of metadata. GPT-4 showed moderate performance for ICDR classification (accuracy 67.5%, macro F1 0.33, weighted F1 0.67, kappa 0.25), driven mainly by correct identification of normal cases. Performance improved in the binary DR referral task (accuracy 82.3%, F1 0.54, kappa 0.44). For glaucoma referral, performance was poor across all settings (accuracy ~78%, F1 <0.04, kappa <0.03). Metadata inclusion did not significantly affect outcomes (McNemar p > 0.05), and predictions remained consistent across conditions. GPT-4 can simulate basic ophthalmic decision-making from structured prompts but lacks precision for complex tasks. While not suitable for clinical use, LLMs may assist in education, documentation, or image annotation workflows in ophthalmology.

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.