ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences (2311.06025v3)

Published 10 Nov 2023 in cs.CL

Abstract: Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective NLP solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current LLMs offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.

PDF HTML Abstract

ChiMed-GPT: A Transformer for Chinese Medical Text Processing

The paper discusses ChiMed-GPT, a LLM designed specifically to cater to the Chinese medical domain by integrating a comprehensive training regime to enhance its performance and alignment with human preferences. This paper highlights the necessity of bridging existing limitations in LLMs pertaining to medical text processing, particularly addressing context length and domain knowledge acquisition through a robust training protocol incorporating pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF).

At the core of ChiMed-GPT is Ziya-13B-v2, a Transformer-based model that embodies the capability to process extended contexts up to 4,096 tokens – a significant improvement over the standard 2,048 tokens found in many LLMs. The paper makes a critical observation that restricted context length poses barriers for effective NLP in the medical domain where detailed texts are integral.

The training regime begins with continuous pre-training using the Chinese Medical Dataset (CMD), ensuring thorough exposure to domain-specific knowledge. SFT is then carried out using a variety of curated datasets comprising medical dialogues and question-answer pairs. Notably, the SFT incorporates safety prompts to address potential harmful content generation.

ChiMed-GPT further exemplifies alignment with human preferences through RLHF, employing rejection sampling fine-tuning. This approach enhances the model’s ability to generate contextually appropriate and insightful responses in medical interactions.

Evaluation and Results

The model’s efficacy is systematically evaluated across critical tasks:

Information Extraction: Tested on named entity recognition, ChiMed-GPT achieves superior F1 scores, outperforming general and medical domain baselines.
Question Answering: On open-ended and multi-choice QA datasets, including ChiMed, C-Eval, CMMLU, and MedQA, ChiMed-GPT demonstrates heightened accuracy and response quality, particularly in real-world medical scenarios.
Dialogue Generation: Its performance in multi-turn dialogue generation, assessed through metrics like BLEU and ROUGE, is notably effective, suggesting practical applicability in patient-doctor interactions.

Bias Analysis

The paper also examines the model's propensity for bias using scales such as CAMI and MICA. ChiMed-GPT's relatively low bias scores underscore its adherence to responsible content generation, an essential aspect for technology deployed in sensitive domains like healthcare.

Implications and Future Directions

Practically, ChiMed-GPT embodies a significant advancement for automated medical support systems, potentially enhancing healthcare accessibility and efficiency. Theoretically, advocating such nuanced domain-specific LLMs, the paper underscores potential expansions in context processing and nuanced alignment models, offering insightful direction for future LLM advancements.

In conclusion, this paper provides a sound exploration into accommodating domain-specific needs within the field of LLMs while mitigating engineering barriers such as context constraints and alignment with end-user expectations. ChiMed-GPT serves as a benchmark in navigating the complexities of NLP applications in healthcare, aligning large-scale data processing with the intricacies of human interaction in Chinese medical contexts.

PDF Markdown Bookmark Chat (Pro)

References (51)

Authors (5)

Yuanhe Tian (15 papers)
Ruyi Gan (14 papers)
Yan Song (91 papers)
Jiaxing Zhang (39 papers)
Yongdong Zhang (119 papers)

Citations (21)

View on Semantic Scholar

GitHub

GitHub - synlp/ChiMed-GPT: ChiMed-GPT is a Chinese medical large language model (LLM) built by continually training Ziya-v2 on Chinese medical data, where pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF) are comprehensively performed on it. (94 stars)