Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating Large Language Models in Ophthalmology (2311.04933v1)

Published 7 Nov 2023 in cs.CL and cs.AI

Abstract: Purpose: The performance of three different LLMs (GPT-3.5, GPT-4, and PaLM2) in answering ophthalmology professional questions was evaluated and compared with that of three different professional populations (medical undergraduates, medical masters, and attending physicians). Methods: A 100-item ophthalmology single-choice test was administered to three different LLMs (GPT-3.5, GPT-4, and PaLM2) and three different professional levels (medical undergraduates, medical masters, and attending physicians), respectively. The performance of LLM was comprehensively evaluated and compared with the human group in terms of average score, stability, and confidence. Results: Each LLM outperformed undergraduates in general, with GPT-3.5 and PaLM2 being slightly below the master's level, while GPT-4 showed a level comparable to that of attending physicians. In addition, GPT-4 showed significantly higher answer stability and confidence than GPT-3.5 and PaLM2. Conclusion: Our study shows that LLM represented by GPT-4 performs better in the field of ophthalmology. With further improvements, LLM will bring unexpected benefits in medical education and clinical decision making in the near future.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Jason Holmes (19 papers)
  2. Shuyuan Ye (1 paper)
  3. Yiwei Li (107 papers)
  4. Shi-Nan Wu (2 papers)
  5. Zhengliang Liu (91 papers)
  6. Zihao Wu (100 papers)
  7. Jinyu Hu (4 papers)
  8. Huan Zhao (109 papers)
  9. Xi Jiang (53 papers)
  10. Wei Liu (1135 papers)
  11. Hong Wei (10 papers)
  12. Jie Zou (32 papers)
  13. Tianming Liu (161 papers)
  14. Yi Shao (8 papers)
Citations (1)