PlugMed: Improving Specificity in Patient-Centered Medical Dialogue Generation using In-Context Learning (2305.11508v2)
Abstract: The patient-centered medical dialogue systems strive to offer diagnostic interpretation services to users who are less knowledgeable about medical knowledge, through emphasizing the importance of providing responses specific to the patients. It is difficult for the LLMs to guarantee the specificity of responses in spite of its promising performance even in some tasks in medical field. Inspired by in-context learning, we propose PlugMed, a Plug-and-Play Medical Dialogue System, for addressing this challenge. PlugMed is equipped with two modules, the prompt generation (PG) module and the response ranking (RR) module, to enhances LLMs' dialogue strategies for improving the specificity of the dialogue. The PG module is designed to stimulate the imitative ability of LLMs by providing them with real dialogues from similar patients as prompts. The RR module incorporates fine-tuned small model as response filter to enable the selection of appropriate responses generated by LLMs. Furthermore, we introduce a new evaluation method based on matching both user's intent and high-frequency medical term to effectively assess the specificity of the responses. We conduct experimental evaluations on three medical dialogue datasets, and the results, including both automatic and human evaluation, demonstrate the effectiveness of our approach.
- Multiwoz–a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. arXiv preprint arXiv:1810.00278.
- Boxing Chen and Colin Cherry. 2014. A systematic comparison of smoothing techniques for sentence-level BLEU. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 362–367, Baltimore, Maryland, USA. Association for Computational Linguistics.
- Stabilized in-context learning with pre-trained language models for few shot dialogue state tracking. arXiv preprint arXiv:2302.05932.
- A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets. Bioinformatics. Btac817.
- A survey for in-context learning. arXiv preprint arXiv:2301.00234.
- Cmeie: Construction and evaluation of chinese medical information extraction dataset. In Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, pages 270–282. Springer.
- Building a pediatric medical corpus: Word segmentation and named entity annotation. In Chinese Lexical Semantics: 21st Workshop, CLSW 2020, Hong Kong, China, May 28–30, 2020, Revised Selected Papers 21, pages 652–664. Springer.
- Chatgpt and antimicrobial advice: the end of the consulting infection doctor? The Lancet Infectious Diseases, 23(4):405–406.
- In-context learning for few-shot dialogue state tracking. arXiv preprint arXiv:2203.08568.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
- Does GPT-3 generate empathetic dialogues? a novel in-context example selection method and automatic evaluation metric for empathetic dialogue generation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 669–683, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.
- Semi-supervised variational reasoning for medical dialogue generation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 544–554, New York, NY, USA. Association for Computing Machinery.
- Semi-supervised variational reasoning for medical dialogue generation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 544–554.
- Controllable dialogue simulation with in-context learning. arXiv preprint arXiv:2210.04185.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Enhancing dialogue symptom diagnosis with global attention and symptom graph. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5033–5042, Hong Kong, China. Association for Computational Linguistics.
- Meddg: an entity-centric medical consultation dataset for entity-aware medical dialogue generation. In Natural Language Processing and Chinese Computing: 11th CCF International Conference, NLPCC 2022, Guilin, China, September 24–25, 2022, Proceedings, Part I, pages 447–459. Springer.
- Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6). Bbac409.
- Using in-context learning to improve dialogue safety. arXiv preprint arXiv:2302.00871.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.
- Semantic answer similarity for evaluating question answering models. arXiv preprint arXiv:2108.06130.
- Conversation style transfer using few-shot learning. arXiv preprint arXiv:2302.08362.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138.
- Multi-task pre-training for plug-and-play task-oriented dialogue system. arXiv preprint arXiv:2109.14739.
- Mars: Semantic-aware contrastive learning for end-to-end task-oriented dialog.
- Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
- Terminology-aware medical dialogue generation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
- Mina Valizadeh and Natalie Parde. 2022. The AI doctor is in: A survey of task-oriented dialogue systems for healthcare applications. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6638–6660, Dublin, Ireland. Association for Computational Linguistics.
- Knowledge graph assisted end-to-end medical dialog generation. Artificial Intelligence in Medicine, page 102535.
- Knowledge grounded medical dialogue generation using augmented graphs. Scientific Reports, 13(1):3310.
- Task-oriented dialogue system as natural language generation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2698–2703.
- Task-oriented dialogue system for automatic diagnosis. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 201–207, Melbourne, Australia. Association for Computational Linguistics.
- Yixuan Weng. 2022. A large chinese medical cqa. https://github.com/WENGSYX/CMCQA.
- Generative adversarial regularized mutual information policy gradient framework for automatic diagnosis. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 1062–1069.
- Ubar: Towards fully end-to-end task-oriented dialog system with gpt-2. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14230–14238.
- MedDialog: Large-scale medical dialogue datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9241–9250, Online. Association for Computational Linguistics.
- Huatuogpt, towards taming language models to be a doctor. arXiv preprint arXiv:2305.15075.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
- Chengfeng Dou (7 papers)
- Zhi Jin (160 papers)
- Wenping Jiao (1 paper)
- Haiyan Zhao (42 papers)
- Zhenwei Tao (2 papers)
- Yongqiang Zhao (26 papers)