Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 52 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 13 tok/s Pro

GPT-4o 100 tok/s Pro

Kimi K2 192 tok/s Pro

GPT OSS 120B 454 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301v2)

Published 18 Jul 2024 in cs.CL, cs.AI, and cs.LG

Abstract: The field of medical diagnosis has undergone a significant transformation with the advent of LLMs, yet the challenges of interpretability within these models remain largely unaddressed. This study introduces Chain-of-Diagnosis (CoD) to enhance the interpretability of LLM-based medical diagnostics. CoD transforms the diagnostic process into a diagnostic chain that mirrors a physician's thought process, providing a transparent reasoning pathway. Additionally, CoD outputs the disease confidence distribution to ensure transparency in decision-making. This interpretability makes model diagnostics controllable and aids in identifying critical symptoms for inquiry through the entropy reduction of confidences. With CoD, we developed DiagnosisGPT, capable of diagnosing 9604 diseases. Experimental results demonstrate that DiagnosisGPT outperforms other LLMs on diagnostic benchmarks. Moreover, DiagnosisGPT provides interpretability while ensuring controllability in diagnostic rigor.

Citations (5)

View on Semantic Scholar

Collections

Summary

The paper introduces the Chain-of-Diagnosis (CoD) framework that decomposes the diagnostic process into clear, interpretable steps to improve transparency in LLM reasoning.
The paper demonstrates that leveraging a synthetic dataset of 48,020 cases enables superior diagnostic accuracy and effective confidence assessment via entropy reduction.
The paper highlights practical implications for scalable, trustworthy AI in clinical settings and suggests future integration with real-world diagnostic workflows.

Chain-of-Diagnosis: Enhancing Interpretability in LLM-based Medical Diagnostics

The paper "CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis" introduces Chain-of-Diagnosis (CoD), a novel framework aimed at enhancing the interpretability of LLMs in the domain of medical diagnostics. The authors present key contributions in terms of methodological advancements and empirical results, positioning CoD as a significant step toward transparent and controllable automated diagnostic systems.

Problem Context and Motivation

Medical diagnosis is critical and complex, involving both explicit symptoms reported by patients and implicit symptoms elicited through further inquiries. LLMs, due to their robust reasoning and dialogue capabilities, are promising candidates for automating this process. However, the "black-box" nature of LLMs poses significant challenges in terms of interpretability, trust, and ethical standards. This paper addresses these limitations by proposing CoD, which transforms the diagnostic process into a transparent and traceable diagnostic chain that mimics a physician’s reasoning pathway.

Methodological Overview

Chain-of-Diagnosis (CoD)

The CoD framework breaks down the diagnostic process into five steps to ensure interpretability and transparency:

Symptom Abstraction: Summarizes patient's explicit symptoms, streamlining the information that LLM needs to process.
Disease Recall: Leverages a disease retriever to identify the top-K candidate diseases based on the abstracted symptoms.
Diagnostic Reasoning: Generates a detailed diagnostic reasoning process for candidate diseases.
Confidence Assessment: Produces a confidence distribution for the candidate diseases, indicating the model’s diagnostic confidence.
Decision Making: Uses a confidence threshold to either confirm a diagnosis or inquire about additional symptoms, balancing accuracy and efficiency.

Data Synthesis and Training

To overcome the challenge of acquiring real-world patient cases due to privacy concerns, the authors synthesized patient case data based on a comprehensive disease database derived from medical encyclopedias. This approach enabled the creation of a robust training dataset consisting of 48,020 synthetic cases covering 9,604 diseases, facilitating scalable and ethical model training.

Empirical Results

The performance of DiagnosisGPT, the LLM developed using CoD, was evaluated against several benchmarks (Muzhi, Dxy, and the newly created DxBench). Key findings include:

Superior Diagnostic Accuracy: DiagnosisGPT outperforms other advanced LLMs in diagnostic benchmarks, achieving higher accuracy through effective symptom inquiry and reasoning.
Confidence-Driven Decision Making: The model demonstrates enhanced accuracy with higher confidence thresholds, validating the reliability of its confidence levels.
Entropy Reduction: CoD effectively reduces diagnostic uncertainty through entropy reduction in symptom inquiry rounds, supporting more efficient and accurate diagnoses.

Implications and Future Directions

Practical Implications

Enhanced Trust and Acceptability: By providing transparency in diagnostic reasoning and confidence levels, CoD can significantly improve the trust and acceptability of LLMs in clinical settings.
Scalability and Privacy: The use of synthetic cases based on disease encyclopedias ensures scalable data availability without privacy and ethical concerns, facilitating wider adoption of automated diagnostic systems.

Theoretical Implications

Interpretable AI: CoD contributes to the growing body of research on interpretable AI by introducing a framework that not only improves transparency but also ensures controllability and reliability in high-stakes applications.
Entropy in Diagnostics: The use of entropy to guide symptom inquiry and reduce diagnostic uncertainty is an innovative approach that could be extended to other areas requiring decision-making under uncertainty.

Future Developments

Broader Disease Coverage: Expanding the disease database to include more conditions and rare diseases will enhance the model's applicability in diverse clinical scenarios.
Real-World Validation: Further validation of DiagnosisGPT in real-world clinical settings will be essential to assess its practical utility and to refine the model based on actual patient interactions.
Integration with Clinical Workflows: Developing interfaces and tools that seamlessly integrate DiagnosisGPT into clinical workflows will be crucial for effective deployment and user adoption.

Conclusion

The paper "CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis" proposes a transformative approach to enhance the interpretability and reliability of LLM-based medical diagnosis. By structuring the diagnostic process into an interpretable chain and leveraging synthetic data for scalable training, the authors have developed DiagnosisGPT, a model that sets a new benchmark in automated medical diagnostics. The success of CoD underscores the importance of transparency and controllability in AI systems, paving the way for more trustworthy and effective medical AI applications.