Towards Accurate Differential Diagnosis with Large Language Models (2312.00164v1)

Published 30 Nov 2023 in cs.CY and cs.AI

Abstract: An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by LLMs present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.

PDF Abstract

Towards Accurate Differential Diagnosis with LLMs

The paper "Towards Accurate Differential Diagnosis with LLMs" addresses the application of LLMs in the domain of diagnostic reasoning, a cornerstone of medical practice. The researchers introduce an LLM tailored for differential diagnosis (DDx) tasks, and the paper investigates its standalone effectiveness as well as its role as an aid to clinicians navigating complex diagnostic challenges.

Methodology and Evaluation

The researchers conducted a two-stage paper featuring 20 board-certified clinicians who tackled 302 challenging medical cases sourced from the New England Journal of Medicine. The clinicians were divided into two conditions: one group had access to traditional resources like search engines, while the other additionally utilized the LLM. The efficacy of the LLM was measured in terms of top-10 accuracy, where it significantly outperformed unassisted clinicians (59.1% vs. 33.6%, p = 0.04).

The LLM's DDx lists were evaluated for quality, appropriateness, and comprehensiveness against those generated by clinicians. The model achieved high scores across these criteria, with a 54% match rate on quality scores indicating the final diagnosis was included in the list. This demonstrated the LLM's ability to generate more complete and relevant differential lists than unassisted medical professionals.

Key Findings

Stand-Alone Performance: The LLM for DDx surpassed the traditional benchmarks set by clinicians, achieving higher scores in top-N accuracy and comprehensive list creation.
Assistive Impact: Clinicians using the LLM tool exhibited improved diagnostic accuracy and comprehensiveness compared to those utilizing only search engines or traditional methods. The LLM enhanced the diversity and length of the differential lists, indicating its utility in expanding diagnostic possibilities.
Interface and Interaction: The user interface allowed clinicians to engage with the LLM conversationally. The formal integration of the LLM into diagnostic workflow was shown to be efficient, as the time spent on the DDx tasks was comparable to using usual internet search methods.

Implications

The implications of this research are profound for both practical and theoretical domains. Practically, LLMs like the one developed could present a valuable tool in assisting clinicians during challenging diagnostic tasks. The enhanced ability to consider a broader range of potential diagnoses could lead to better outcomes in diagnostic reasoning, particularly in complex cases.

Theoretically, this work expands on the potential applications of LLMs beyond typical natural language processing tasks, delineating their role in intricate problem-solving domains such as healthcare diagnostics. The findings suggest avenues for integrating AI into clinical workflows, emphasizing the need for further explorations into the collaborative dynamics between human professionals and AI systems.

Future Directions

The paper opens several prospects for future research:

Real-World Implementation: Further investigation into the real-world application of LLMs in diverse clinical settings. This involves evaluating their impact on patient outcomes, time efficiency, and clinician satisfaction.
Enhancing AI Models: Continuous development of LLMs to incorporate multimodal inputs, such as laboratory and imaging data, could offer a more holistic approach to diagnostics.
Education and Training: The potential of LLMs in medical education and upskilling clinicians in diagnostic reasoning should be rigorously explored.

Conclusion

The paper demonstrates that LLMs have substantial potential as tools for augmenting clinical diagnostic processes. By achieving promising results in differential diagnosis tasks, these models could significantly aid in deploying specialist-level diagnostic capabilities across varied healthcare contexts, enhancing both access and quality of care. However, further real-world validations and studies are essential to ensure the safe and effective integration of these advanced AI systems into medical practice.

PDF Markdown Bookmark Chat (Pro)

Authors (28)

Daniel McDuff (88 papers)
Mike Schaekermann (20 papers)
Tao Tu (45 papers)
Anil Palepu (12 papers)
Amy Wang (6 papers)
Jake Garrison (8 papers)
Karan Singhal (26 papers)
Yash Sharma (45 papers)
Shekoofeh Azizi (23 papers)
Kavita Kulkarni (7 papers)
Le Hou (36 papers)
Yong Cheng (58 papers)
Yun Liu (213 papers)
S Sara Mahdavi (45 papers)
Sushant Prakash (15 papers)
Anupam Pathak (3 papers)
Christopher Semturs (12 papers)
Shwetak Patel (58 papers)
Dale R Webster (23 papers)
Ewa Dominowska (4 papers)

Citations (46)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/AdamRodmanMD/status/1851423134030946573

https://twitter.com/JC_Shepherd_/status/1870308701824815204

https://twitter.com/apoorvasriniva/status/1883585169690644680

https://twitter.com/dylanwiliam/status/1792269092835000403

https://twitter.com/TLeungMD/status/1780956081201483949

https://twitter.com/erdavtyan/status/1880647440841535965