Papers
Topics
Authors
Recent
Search
2000 character limit reached

DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation

Published 28 Aug 2023 in cs.CL and cs.AI | (2308.14346v1)

Abstract: We propose DISC-MedLLM, a comprehensive solution that leverages LLMs to provide accurate and truthful medical response in end-to-end conversational healthcare services. To construct high-quality Supervised Fine-Tuning (SFT) datasets, we employ three strategies: utilizing medical knowledge-graphs, reconstructing real-world dialogues, and incorporating human-guided preference rephrasing. These datasets are instrumental in training DISC-MedLLM, surpassing existing medical LLMs in both single-turn and multi-turn consultation scenarios. Extensive experimental results demonstrate the effectiveness of the proposed model in bridging the gap between general LLMs and real-world medical consultation. Additionally, we release the constructed dataset and model weights to further contribute to research and development. Further details and resources can be found at https://github.com/FudanDISC/DISC-MedLLM

Citations (49)

Summary

  • The paper presents DISC-MedLLM, a two-stage supervised fine-tuning approach that enhances LLM performance for accurate medical consultations.
  • It employs medical knowledge graphs, dialogue re-construction, and human preference rephrasing to create a reliable training dataset from real-world data.
  • Results show notable improvements over existing models, providing open-source resources to further advance AI-driven medical consultation research.

DISC-MedLLM: Bridging LLMs and Medical Consultation

The paper presents DISC-MedLLM, an approach that leverages LLMs to provide reliable and accurate medical responses within conversational healthcare settings. The research aims to address the gap between the capabilities of general LLMs and the nuanced requirements of medical consultation.

Methodology

DISC-MedLLM employs a two-stage Supervised Fine-Tuning (SFT) process to enhance LLM performance in medical contexts. The model training uses a custom dataset built upon several strategies:

  1. Medical Knowledge Graphs: The researchers utilize department-oriented sampling from medical knowledge graphs to create knowledge-intensive QA pairs. This ensures that the dataset is grounded in reliable medical information.
  2. Dialogue Re-Construction: Real-world dialogues from medical forums are adapted using GPT-3.5 for better linguistic alignment. This re-construction removes informal language and inconsistencies, enhancing data quality for effective model training.
  3. Human Preference Rephrasing: A carefully curated subset of real-world dialogues was refined using human preferences to align the model's behavior with desired conversational standards and effectiveness.

Evaluation

The paper assesses the performance of DISC-MedLLM in both single-turn and multi-turn scenarios. In single-turn evaluation, a benchmark is created from multiple-choice questions derived from public medical datasets, focusing on accuracy. For multi-turn evaluation, GPT-3.5 simulates patient interactions, and GPT-4 evaluates the model’s performance based on proactivity, accuracy, helpfulness, and linguistic quality.

Results

DISC-MedLLM demonstrates a notable improvement over existing models, such as HuatuoGPT, in multi-choice accuracy and multi-turn dialogue evaluation. However, while it performs well, DISC-MedLLM still lags behind GPT-3.5 in some areas, indicating room for further refinement.

Implications and Future Work

This research illustrates the potential of integrating domain-specific knowledge with LLMs to enhance practical application in the medical field. The release of both the dataset and model weights contributes valuable resources for future work in AI healthcare.

Future developments could explore retrieval-augmented techniques to incorporate even broader knowledge bases, providing enhanced solutions for complex medical inquiries. Additionally, refining the alignment of LLMs with human-like empathy and response strategies could further elevate their utility in medical consultations.

In conclusion, DISC-MedLLM marks a significant step towards bridging the gap between general language understanding and domain-specific application, setting a foundation for future innovations in AI-driven medical consultation.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.