Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback (2401.05695v2)
Abstract: The use of LLMs in medical dialogue generation has garnered significant attention, with a focus on improving response quality and fluency. While previous studies have made progress in optimizing model performance for single-round medical Q&A tasks, there is a need to enhance the model's capability for multi-round conversations to avoid logical inconsistencies. To address this, we propose an approach called preference learning from process feedback~(PLPF), which integrates the doctor's diagnostic logic into LLMs. PLPF involves rule modeling, preference data generation, and preference alignment to train the model to adhere to the diagnostic process. Experimental results using Standardized Patient Testing show that PLPF enhances the diagnostic accuracy of the baseline model in medical conversations by 17.6%, outperforming traditional reinforcement learning from human feedback. Additionally, PLPF demonstrates effectiveness in both multi-round and single-round dialogue tasks, showcasing its potential for improving medical dialogue generation.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
- Disc-medllm: Bridging general large language models and real-world medical consultation. arXiv preprint arXiv:2308.14346.
- Huatuogpt-ii, one-stage training for medical adaption of llms. arXiv preprint arXiv:2311.09774.
- A benchmark for automatic medical consultation system: frameworks, tasks and datasets. Bioinformatics, 39(1):btac817.
- Plugmed: Improving specificity in patient-centered medical dialogue generation using in-context learning. In The 2023 Conference on Empirical Methods in Natural Language Processing.
- Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335.
- Reinforced self-training (rest) for language modeling. arXiv preprint arXiv:2308.08998.
- Applying deep matching networks to chinese medical question answering: a study and a dataset. BMC medical informatics and decision making, 19(2):91–100.
- A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.
- Understanding the effects of rlhf on llm generalisation and diversity. arXiv preprint arXiv:2310.06452.
- Rlaif: Scaling reinforcement learning from human feedback with ai feedback.
- Meddg: An entity-centric medical consultation dataset for entity-aware medical dialogue generation.
- OpenAI. 2023. ChatGPT: A Large-Scale Open-Domain Chatbot. https://openai.com/chatgpt. Version 2.0.
- Direct preference optimization: Your language model is secretly a reward model.
- Salmon: Self-alignment with principle-following reward models. arXiv preprint arXiv:2310.05910.
- Clinicalgpt: Large language models finetuned with diverse medical data and comprehensive evaluation. arXiv preprint arXiv:2306.09968.
- Huatuo: Tuning llama model with chinese medical knowledge.
- JunYong Zhu WeiGuo Dong. 2012. Objective Structured Clinical Examination & Standardized Patients. People’s Medical Publishing House (PMPH), No. 19, Panjiayuan Nanli, Chaoyang District, Beijing, China.
- Doctorglm: Fine-tuning your chinese doctor is not a herculean task. arXiv preprint arXiv:2304.01097.
- Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
- Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
- Huatuogpt, towards taming language models to be a doctor. arXiv preprint arXiv:2305.15075.
- A survey of large language models.
- Wei Zhu and Xiaoling Wang. 2023. Chatmed: A chinese medical large language model. https://github.com/michael-wzhu/ChatMed.
- Chengfeng Dou (7 papers)
- Zhi Jin (160 papers)
- Wenpin Jiao (15 papers)
- Haiyan Zhao (42 papers)
- Yongqiang Zhao (26 papers)
- Zhenwei Tao (2 papers)