Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HuatuoGPT, towards Taming Language Model to Be a Doctor (2305.15075v1)

Published 24 May 2023 in cs.CL and cs.AI
HuatuoGPT, towards Taming Language Model to Be a Doctor

Abstract: In this paper, we present HuatuoGPT, a LLM for medical consultation. The core recipe of HuatuoGPT is to leverage both \textit{distilled data from ChatGPT} and \textit{real-world data from doctors} in the supervised fine-tuned stage. The responses of ChatGPT are usually detailed, well-presented and informative while it cannot perform like a doctor in many aspects, e.g. for integrative diagnosis. We argue that real-world data from doctors would be complementary to distilled data in the sense the former could tame a distilled LLM to perform like doctors. To better leverage the strengths of both data, we train a reward model to align the LLM with the merits that both data bring, following an RLAIF (reinforced learning from AI feedback) fashion. To evaluate and benchmark the models, we propose a comprehensive evaluation scheme (including automatic and manual metrics). Experimental results demonstrate that HuatuoGPT achieves state-of-the-art results in performing medical consultation among open-source LLMs in GPT-4 evaluation, human evaluation, and medical benchmark datasets. It is worth noting that by using additional real-world data and RLAIF, the distilled LLM (i.e., HuatuoGPT) outperforms its teacher model ChatGPT in most cases. Our code, data, and models are publicly available at \url{https://github.com/FreedomIntelligence/HuatuoGPT}. The online demo is available at \url{https://www.HuatuoGPT.cn/}.

An Analysis of HuatuoGPT: A Medical LLM

The paper introduces HuatuoGPT, a LLM designed specifically for medical consultation. The development and training of HuatuoGPT leverage both distilled data from ChatGPT and real-world data from medical professionals. This combination aims to address the limitations of existing models in integrating detailed, fluent responses with diagnostic accuracy.

Model Development Approach

The core methodology involves a hybrid data strategy during the supervised fine-tuning (SFT) phase. Distilled data from ChatGPT contribute by providing fluency and comprehensive communication style, whereas real-world data from doctors enhance diagnostic precision and professional alignment with medical practices. This dual data leverage seeks to mitigate the insufficiencies seen in ChatGPT's generalized responses, which often exclude specific diagnostic questions crucial in medical contexts.

To further align HuatuoGPT with both patient-friendly and doctor-like responses, the model is integrated with a Reinforced Learning from AI Feedback (RLAIF) protocol. The reward model is trained to balance correctness, informativeness, logical consistency, and diagnostic capability, effectively overcoming typical auto-regressive LLM issues such as hallucination and non-inquisitive responses.

Evaluation and Results

The evaluation framework employs a combination of automatic and manual metrics across various benchmarks, including CBLUE, CmedQA, webMedQA, and Huatuo26M datasets. In these assessments, HuatuoGPT demonstrates state-of-the-art performance in medical consultation tasks, surpassing both open-source models and the baseline ChatGPT in several domains.

Key numerical results include:

  • Superior performance in Chinese medical QA datasets like cMedQA2, with notable gains in BLEU, ROUGE, and distinct metrics.
  • GPT-4 and human evaluators confirm HuatuoGPT's preference over GPT-3.5-turbo in over 60% of evaluated cases, showcasing its adept integration of diagnostic proficiency and conversational fluency.

Implications and Future Directions

The implications of HuatuoGPT extend to practical healthcare applications, where equitable access to medical expertise can be significantly enhanced via online platforms. This functionality is crucial in addressing healthcare disparities, particularly in under-resourced regions.

On a theoretical level, HuatuoGPT offers insights into integrating LLMs with domain-specific expertise, emphasizing the possible transformation that LLMs can bring to medical domains. This aligns with the broader trend of ‘intelligent medicine,’ poised to redefine traditional healthcare delivery.

Future developments could explore fine-tuning advancements to further improve HuatuoGPT’s diagnostic capabilities. Additionally, expanding the model’s database to include more diverse healthcare practices, like indigenous medicine systems, might fortify its adaptability across varying cultural contexts. The paper also highlights ethical considerations, emphasizing the importance of accuracy and responsibility when deploying AI in healthcare.

In summary, HuatuoGPT is a compelling model that bridges the gaps between state-of-the-art LLMs and the nuanced demands of medical diagnostics. Its development and fine-tuning approaches set a precedent for future AI applications in specialized domains, promising significant advancements in both AI research and practical, real-world implementations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Hongbo Zhang (54 papers)
  2. Junying Chen (26 papers)
  3. Feng Jiang (97 papers)
  4. Fei Yu (76 papers)
  5. Zhihong Chen (63 papers)
  6. Jianquan Li (18 papers)
  7. Guiming Chen (4 papers)
  8. Xiangbo Wu (8 papers)
  9. Zhiyi Zhang (31 papers)
  10. Qingying Xiao (5 papers)
  11. Xiang Wan (93 papers)
  12. Benyou Wang (109 papers)
  13. Haizhou Li (285 papers)
Citations (136)