Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding (2403.06611v1)

Published 11 Mar 2024 in cs.CL and cs.AI

Abstract: With appropriate data selection and training techniques, LLMs have demonstrated exceptional success in various medical examinations and multiple-choice questions. However, the application of LLMs in medical dialogue generation-a task more closely aligned with actual medical practice-has been less explored. This gap is attributed to the insufficient medical knowledge of LLMs, which leads to inaccuracies and hallucinated information in the generated medical responses. In this work, we introduce the Medical dialogue with Knowledge enhancement and clinical Pathway encoding (MedKP) framework, which integrates an external knowledge enhancement module through a medical knowledge graph and an internal clinical pathway encoding via medical entities and physician actions. Evaluated with comprehensive metrics, our experiments on two large-scale, real-world online medical consultation datasets (MedDG and KaMed) demonstrate that MedKP surpasses multiple baselines and mitigates the incidence of hallucinations, achieving a new state-of-the-art. Extensive ablation studies further reveal the effectiveness of each component of MedKP. This enhancement advances the development of reliable, automated medical consultation responses using LLMs, thereby broadening the potential accessibility of precise and real-time medical assistance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA internal medicine.
  2. Susan Cameron and Imani Turtle-Song. 2002. Learning to write case notes using the soap format. Journal of Counseling & Development, 80(3):286–292.
  3. A systematic review of health dialog systems. Methods of information in medicine, 58(06):179–193.
  4. Plugmed: Improving specificity in patient-centered medical dialogue generation using in-context learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5050–5066.
  5. Stefan Harrer. 2023. Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine. EBioMedicine, 90.
  6. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  7. Chatgpt makes medicine easy to swallow: an exploratory case study on simplified radiology reports. European radiology, pages 1–9.
  8. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
  9. Benefits, limits, and risks of gpt-4 as an ai chatbot for medicine. New England Journal of Medicine, 388(13):1233–1239.
  10. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
  11. Semi-supervised variational reasoning for medical dialogue generation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 544–554.
  12. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  13. Meddg: an entity-centric medical consultation dataset for entity-aware medical dialogue generation. In CCF International Conference on Natural Language Processing and Chinese Computing, pages 447–459. Springer.
  14. Uncovering language disparity of chatgpt on retinal vascular disease classification: Cross-sectional study. Journal of Medical Internet Research, 26(1):e51926.
  15. Translating radiology reports into plain language using chatgpt and gpt-4 with prompt learning: results, limitations, and potential. Visual Computing for Industry, Biomedicine, and Art, 6(1):9.
  16. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375.
  17. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  18. Clinically correct report generation from chest x-rays using templates. In Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, pages 654–663. Springer.
  19. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  20. Semantic answer similarity for evaluating question answering models. arXiv preprint arXiv:2108.06130.
  21. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA, 329(10):842–844.
  22. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the AAAI conference on artificial intelligence, volume 30.
  23. Large language models encode clinical knowledge. Nature, 620(7972):172–180.
  24. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617.
  25. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
  26. Large language models in medicine. Nature medicine, 29(8):1930–1940.
  27. Towards generalist biomedical ai. arXiv preprint arXiv:2307.14334.
  28. Chatcad: Interactive computer-aided diagnosis on medical image using large language models. arXiv preprint arXiv:2302.07257.
  29. Chatgpt: promise and challenges for deployment in low-and middle-income countries. The Lancet Regional Health–Western Pacific, 41.
  30. Task-oriented dialogue system for automatic diagnosis. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 201–207.
  31. Qualifying chinese medical licensing examination with knowledge enhanced generative pre-training model. arXiv preprint arXiv:2305.10163.
  32. Medical dialogue generation via dual flow modeling. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6771–6784, Toronto, Canada. Association for Computational Linguistics.
  33. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  34. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jiageng Wu (16 papers)
  2. Xian Wu (139 papers)
  3. Yefeng Zheng (197 papers)
  4. Jie Yang (516 papers)
Citations (2)