Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis (2505.03467v1)

Published 6 May 2025 in cs.CL

Abstract: Explainable disease diagnosis, which leverages patient information (e.g., signs and symptoms) and computational models to generate probable diagnoses and reasonings, offers clear clinical values. However, when clinical notes encompass insufficient evidence for a definite diagnosis, such as the absence of definitive symptoms, diagnostic uncertainty usually arises, increasing the risk of misdiagnosis and adverse outcomes. Although explicitly identifying and explaining diagnostic uncertainties is essential for trustworthy diagnostic systems, it remains under-explored. To fill this gap, we introduce ConfiDx, an uncertainty-aware LLM created by fine-tuning open-source LLMs with diagnostic criteria. We formalized the task and assembled richly annotated datasets that capture varying degrees of diagnostic ambiguity. Evaluating ConfiDx on real-world datasets demonstrated that it excelled in identifying diagnostic uncertainties, achieving superior diagnostic performance, and generating trustworthy explanations for diagnoses and uncertainties. To our knowledge, this is the first study to jointly address diagnostic uncertainty recognition and explanation, substantially enhancing the reliability of automatic diagnostic systems.

Summary

The paper introduces ConfiDx, a novel model enhancing automatic disease diagnosis by recognizing uncertainty and generating explanations, addressing a key gap in traditional systems.
By formalizing diagnostic uncertainty and fine-tuning LLMs on clinical criteria using annotated datasets, ConfiDx achieved significantly higher diagnostic accuracy and improved uncertainty recognition over baseline models.
Integrating diagnostic criteria into LLMs promises to improve clinical reliability and reduce potential misdiagnoses by providing clinicians with necessary detailed explanations for patient care decisions.

Uncertainty-Aware LLMs for Explainable Disease Diagnosis

The paper entitled "Uncertainty-Aware LLMs for Explainable Disease Diagnosis" introduces ConfiDx, a novel model designed to enhance automatic diagnostic systems by addressing both diagnostic uncertainty recognition and explanation. This paper investigates a critical aspect that traditional diagnostic models often overlook: the ability to manage diagnostic uncertainty when patient data is incomplete or ambiguous.

Formal Definition and Contribution

The authors begin by formalizing the concept of uncertainty-aware disease diagnosis. This involves developing models that, when provided with clinical information, can offer likely diagnoses while also recognizing when the available data is insufficient for a definitive diagnosis. This dual capability not only enhances the trustworthiness of diagnostic systems but also provides narrative explanations that can be pivotal for clinicians making informed decisions.

To achieve this, the paper introduces three key contributions:

Formalization of Task: Diagnostic uncertainty is explicitly defined, enabling computational models to discern such uncertain cases along with their explanations.
Dataset Creation: Richly annotated clinical datasets were assembled, encapsulating various levels of diagnostic ambiguity, to rigorously evaluate the reliability of diagnostic models.
Fine-tuning Approach: Open-source LLMs were fine-tuned with diagnostic criteria to ensure alignment with professional clinical reasoning, thereby allowing improved adherence to guidelines and systematic recognition of uncertainty.

Experimental Framework and Results

The paper employs clinical notes from MIMIC-IV and the University of Minnesota Clinical Data Repository (UMN-CDR) to test the robustness and generalization of ConfiDx. The evaluation is structured across four subtasks: disease diagnosis, diagnostic explanation, uncertainty recognition, and uncertainty explanation, with performance metrics such as diagnostic accuracy and BERTScore.

The results demonstrate that ConfiDx surpasses off-the-shelf models significantly in terms of diagnostic accuracy and uncertainty recognition. For example, the fine-tuned LLMs showed a diagnostic accuracy improvement of over 68.3% compared to baseline models, with statistically significant results (p < 0.001). Furthermore, ConfiDx achieved interpretative accuracy scores ranging from 25.3% to 43.8%, highlighting enhanced explanatory capacity compared to standard models.

Additionally, the robustness of ConfiDx was proven through its performance on the MIMIC-U dataset, which consists of diseases absent from the training data. Fine-tuned LLMs succeeded in delivering consistent accuracy improvements: up to 41.8% over baseline models in recognizing diagnostic uncertainty.

Implications and Future Directions

The implications of this research are significant, especially concerning enhancing the reliability and trustworthiness of diagnostic models. By integrating diagnostic criteria into model training, ConfiDx promises to increase the accuracy of automatic diagnoses while providing clinicians with detailed explanations necessary for patient care. This dual capability could minimize misdiagnoses and adverse consequences, effectively transforming clinical decision-making.

Moreover, the paper lays a foundation for future developments in AI by demonstrating that LLMs' training efficiency can be improved through expert-level task alignment. Future research could explore expanding these models' capabilities to include broader sets of medical specialties and integrating real-time data for dynamic diagnostic processing.

Conclusion

"Uncertainty-Aware LLMs for Explainable Disease Diagnosis" presents a decisive improvement in the field of automatic disease diagnosis. ConfiDx exemplifies a significant stride towards bridging the gap between computational models and clinical reliability, underscoring the importance of addressing diagnostic uncertainty with precision and thorough explanations. This research not only informs current AI implementations in healthcare but also propels the vision for developing more comprehensive diagnostic tools that cater to the nuances of clinical settings.