Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges (2405.10630v1)

Published 17 May 2024 in cs.CL and cs.AI

Abstract: This paper surveys and organizes research works on medical dialog systems, which is an important yet challenging task. Although these systems have been surveyed in the medical community from an application perspective, a systematic review from a rigorous technical perspective has to date remained noticeably absent. As a result, an overview of the categories, methods, and evaluation of medical dialogue systems remain limited and underspecified, hindering the further improvement of this area. To fill this gap, we investigate an initial pool of 325 papers from well-known computer science, and natural language processing conferences and journals, and make an overview. Recently, LLMs have shown strong model capacity on downstream tasks, which also reshaped medical dialog systems' foundation. Despite the alluring practical application value, current medical dialogue systems still suffer from problems. To this end, this paper lists the grand challenges of medical dialog systems, especially of LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (208)
  1. Asma Ben Abacha and Pierre Zweigenbaum. 2015. Means: A medical question-answering system combining nlp techniques and semantic web technologies. Information processing & management, 51(5):570–594.
  2. Akiko Aizawa. 2003. An information-theoretic perspective of tf–idf measures. Information Processing & Management, 39(1):45–65.
  3. Exploring the numerical reasoning capabilities of language models: A comprehensive analysis on tabular data. Findings of the Association for Computational Linguistics.
  4. A virtual conversational agent for teens with autism spectrum disorder: Experimental results and design lessons. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, pages 1–8.
  5. Novel computational linguistic measures, dialogue system and the development of sophie: Standardized online patient for healthcare interaction education. IEEE Transactions on Affective Computing.
  6. Chatbot for healthcare system using artificial intelligence. In 2020 8th International conference on reliability, infocom technologies and optimization (trends and future directions)(ICRITO), pages 619–622. IEEE.
  7. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  8. Disc-medllm: Bridging general large language models and real-world medical consultation. arXiv preprint arXiv:2308.14346.
  9. Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise information systems, 13(3):329–351.
  10. Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, pages 387–402. Springer.
  11. Shihbot: A facebook chatbot for sexual health information on hiv/aids. In Proceedings of the 18th annual SIGdial meeting on discourse and dialogue, pages 370–373.
  12. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  13. Medbench: A large-scale chinese benchmark for evaluating medical large language models. arXiv preprint arXiv:2312.12806.
  14. Designing a virtual patient dialogue system based on terminology-rich resources: Challenges and evaluation. Natural Language Engineering, 26(2):183–220.
  15. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650.
  16. BioMedBERT: A pre-trained biomedical language model for QA and IR. In Proceedings of the 28th International Conference on Computational Linguistics, pages 669–679, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  17. Purr: Efficiently editing language model hallucinations by denoising language model corruptions. arXiv preprint arXiv:2305.14908.
  18. Huatuogpt-ii, one-stage training for medical adaption of llms. arXiv preprint arXiv:2311.09774.
  19. A benchmark for automatic medical consultation system: Frameworks, tasks and datasets. arXiv preprint arXiv:2204.08997.
  20. KNSE: A knowledge-aware natural language inference framework for dialogue symptom status recognition. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10278–10286, Toronto, Canada. Association for Computational Linguistics.
  21. Yangbin Chen and Chunfeng Liang. 2022. Wish i can feel what you feel: A neural approach for empathetic response generation. arXiv preprint arXiv:2212.02000.
  22. Bianque: Balancing the questioning and suggestion ability of health llms with multi-turn health conversations polished by chatgpt. arXiv preprint arXiv:2310.15896.
  23. Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079.
  24. Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3504–3514.
  25. Chinese medical dialogue information extraction via contrastive multi-utterance inference. Briefings in Bioinformatics, 23(4):bbac284.
  26. Development of virtual patient simulations for medical education. Journal For Virtual Worlds Research, 2(2).
  27. Investigating students’ use of a mental health chatbot to alleviate academic stress. In 6th International ACM In-Cooperation HCI and UX Conference, pages 1–10.
  28. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  29. An explorative study on robotics for supporting children with autism spectrum disorder during clinical procedures. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, pages 189–191.
  30. A survey on ensemble learning. Frontiers of Computer Science, 14:241–258.
  31. Extracting symptoms and their status from clinical conversations. arXiv preprint arXiv:1906.02239.
  32. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325.
  33. Enhancing the reliability and accuracy of ai-enabled diagnosis via complementarity-driven deferral to clinicians. Nature Medicine, 29(7):1814–1820.
  34. Transition-based dependency parsing with stack long short-term memory. arXiv preprint arXiv:1505.08075.
  35. Jennifer D’Souza and Vincent Ng. 2015. Sieve-based entity linking for the biomedical domain. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 297–302.
  36. Katja Filippova. 2020. Controlled hallucinations: Learning to generate faithfully from noisy data. arXiv preprint arXiv:2010.05873.
  37. Susannah Fox et al. 2011. The social life of health information, 2011.
  38. Bioreader: a retrieval-enhanced text-to-text transformer for biomedical literature. In Proceedings of the 2022 conference on empirical methods in natural language processing, pages 5770–5793.
  39. Leveraging a medical knowledge graph into large language models for diagnosis prediction. arXiv preprint arXiv:2308.14321.
  40. Critic: Large language models can self-correct with tool-interactive critiquing. arXiv preprint arXiv:2305.11738.
  41. Generation of synthetic electronic medical record text. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 374–380. IEEE.
  42. A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Preprints.
  43. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. arXiv preprint arXiv:2310.05694.
  44. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
  45. Reliable medical recommendation systems with patient privacy. ACM Transactions on Intelligent Systems and Technology (TIST), 4(4):1–31.
  46. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933.
  47. Teenchat: a chatterbot system for sensing and releasing adolescents’ stress. In Health Information Science: 4th International Conference, HIS 2015, Melbourne, Australia, May 28-30, 2015, Proceedings 4, pages 133–145. Springer.
  48. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232.
  49. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. arXiv preprint arXiv:2305.08322.
  50. Collaboration-based medical knowledge recommendation. Artificial intelligence in medicine, 55(1):13–24.
  51. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991.
  52. Emotional dialogue generation using image-grounded language models. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–12.
  53. Robot-assisted socio-emotional intervention framework for children with autism spectrum disorder. In Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, pages 131–132.
  54. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
  55. Medcpt: Contrastive pre-trained transformers with large-scale pubmed search logs for zero-shot biomedical information retrieval. Bioinformatics, 39(11):btad651.
  56. Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 562–570.
  57. Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
  58. A systematic review of health dialog systems. Methods of information in medicine, 58(06):179–193.
  59. Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
  60. A survey of recommendation systems: recommendation models, techniques, and application fields. Electronics, 11(1):141.
  61. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI conference on artificial intelligence, volume 29.
  62. Chinese emotional dialogue response generation via reinforcement learning. ACM Transactions on Internet Technology (TOIT), 21(4):1–17.
  63. Conversational agents in healthcare: a systematic review. Journal of the American Medical Informatics Association, 25(9):1248–1258.
  64. Ulisboa: Recognition and normalization of medical concepts. In proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 406–411.
  65. The chatbot feels you-a counseling service using emotional response generation. In 2017 IEEE international conference on big data and smart computing (BigComp), pages 437–440. IEEE.
  66. Audis: an automatic crf-enhanced disease normalization in biomedical text. Database, 2016:baw091.
  67. Development and usability of a life-logging behavior monitoring application for obese patients. Journal of obesity & metabolic syndrome, 28(3):194.
  68. Factuality enhanced language models for open-ended text generation. Advances in Neural Information Processing Systems, 35:34586–34599.
  69. " i hear you, i feel you": encouraging deep self-disclosure through a chatbot. In Proceedings of the 2020 CHI conference on human factors in computing systems, pages 1–12.
  70. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1437–1447.
  71. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
  72. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
  73. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  74. Meddm: Llm-executable clinical guidance tree for clinical decision-making. arXiv preprint arXiv:2312.02441.
  75. Semi-supervised variational reasoning for medical dialogue generation. In SIGIR.
  76. Cmmlu: Measuring massive multitask language understanding in chinese. arXiv preprint arXiv:2306.09212.
  77. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110.
  78. A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055.
  79. Flat: Chinese ner using flat-lattice transformer. arXiv preprint arXiv:2004.11795.
  80. Hybrid retrieval-generation reinforced agent for medical image report generation. Advances in neural information processing systems, 31.
  81. A joint model of clinical domain classification and slot filling based on rcnn and bigru-crf. In 2019 IEEE International Conference on Big Data (Big Data), pages 6133–6135. IEEE.
  82. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070.
  83. Task-oriented dialogue system for automatic disease diagnosis via hierarchical reinforcement learning. arXiv preprint arXiv:2004.14254.
  84. Nut Limsopatham and Nigel Collier. 2016. Normalising medical concepts in social media texts by learning semantic representation. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pages 1014–1023.
  85. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  86. Graph-evolving meta-learning for low-resource medical dialogue generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13362–13370.
  87. Enhancing dialogue symptom diagnosis with global attention and symptom graph. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5033–5042.
  88. Dialogue learning with human teaching and feedback in end-to-end trainable task-oriented dialogue systems. arXiv preprint arXiv:1804.06512.
  89. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101.
  90. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
  91. Lexicon enhanced chinese sequence labeling using bert adapter. arXiv preprint arXiv:2105.07148.
  92. " my nose is running."" are you also coughing?": Building a medical diagnosis agent with interpretable inquiry logics. arXiv preprint arXiv:2204.13953.
  93. Meddg: an entity-centric medical consultation dataset for entity-aware medical dialogue generation. In CCF International Conference on Natural Language Processing and Chinese Computing, pages 447–459. Springer.
  94. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602.
  95. Radiology-llama2: best-in-class large language model for radiology. Preprint posted online on August, 29.
  96. Towards conversational recommendation over multi-type dialogs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1036–1049, Online. Association for Computational Linguistics.
  97. Deid-gpt: Zero-shot medical text de-identification by gpt-4. arXiv preprint arXiv:2303.11032.
  98. Lekbot: A talking and playing robot for children with disabilities. In Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, pages 110–119.
  99. Trik: A talking and drawing robot for children with communication disabilities. In Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009), pages 275–278.
  100. Improving biomedical information retrieval with neural retrievers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11038–11046.
  101. Mcn: a comprehensive corpus for medical concept normalization. Journal of biomedical informatics, 92:103132.
  102. Multi-task medical concept normalization using multi-view convolutional neural network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  103. Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354.
  104. " hear me out" smart speaker based conversational agent to monitor symptoms in mental health. In Adjunct proceedings of the 2019 ACM international joint conference on pervasive and ubiquitous computing and proceedings of the 2019 ACM international symposium on wearable computers, pages 929–933.
  105. On faithfulness and factuality in abstractive summarization. arXiv preprint arXiv:2005.00661.
  106. Using a virtual patient system for the teaching of pharmaceutical care. International journal of medical informatics, 84(9):640–646.
  107. Zulfat Miftahutdinov and Elena Tutubalina. 2019. Deep neural models for medical concept normalization in user-generated texts. arXiv preprint arXiv:1907.07972.
  108. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
  109. Robo: A counselor chatbot for opioid addicted patients. In 2020 2nd Symposium on Signal Processing Systems, pages 91–95.
  110. Incorporating medical knowledge to transformer-based language models for medical dialogue generation. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 110–115.
  111. Translational nlp: A new paradigm and general principles for natural language processing research. In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting, volume 2021, page 4125. NIH Public Access.
  112. Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452.
  113. Why we need new evaluation metrics for nlg. arXiv preprint arXiv:1707.06875.
  114. A chatbot for psychiatric counseling in mental healthcare service based on emotional dialogue analysis and sentence generation. In 2017 18th IEEE international conference on mobile data management (MDM), pages 371–375. IEEE.
  115. OpenAI. 2023. Gpt-4 technical report.
  116. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  117. Validity problems comparing values across cultures and possible solutions. Psychological methods, 2(4):329.
  118. End-to-end task-oriented dialogue: A survey of tasks, methods, and future directions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5925–5941, Singapore. Association for Computational Linguistics.
  119. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  120. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
  121. Okapi at trec-3. Nist Special Publication Sp, 109:109.
  122. Caregpt: Medical llm, open source driven for a healthy future. https://github.com/WangRongsheng/CareGPT.
  123. A large-scale dataset for motivational dialogue system: An application of natural language generation to mental health. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
  124. Jainisha Sankhavara. 2018. Biomedical document retrieval for clinical decision support system. In Proceedings of ACL 2018, Student Research Workshop, pages 84–90.
  125. Mourad Sarrouti and Said Ouatik El Alaoui. 2020. Sembionlqa: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions. Artificial intelligence in medicine, 102:101767.
  126. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The knowledge engineering review, 21(2):97–126.
  127. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  128. Adversarial training for free! Advances in Neural Information Processing Systems, 32.
  129. Digital psychiatry-curbing depression using therapy chatbot and depression analysis. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pages 627–631. IEEE.
  130. Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv preprint arXiv:2310.10844.
  131. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580.
  132. MidMed: Towards mixed-type dialogues for medical consultation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8145–8157, Toronto, Canada. Association for Computational Linguistics.
  133. Llm-mini-cex: Automatic evaluation of large language model for diagnostic conversation. arXiv preprint arXiv:2308.07635.
  134. Understanding patient query with weak supervision from doctor response. IEEE Journal of Biomedical and Health Informatics, 26(6):2770–2777.
  135. Training inter-physician communication using the dynamic patient simulator®. International journal of medical informatics, 76(5-6):336–343.
  136. Large language models encode clinical knowledge. Nature, pages 1–9.
  137. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617.
  138. Can you put it all together: Evaluating conversational agents’ ability to blend skills. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2021–2030, Online. Association for Computational Linguistics.
  139. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203.
  140. Enhancing joint multiple intent detection and slot filling with global intent-slot co-occurrence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7967–7977, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  141. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
  142. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
  143. Inquire and diagnose: Neural symptom checking ensemble using deep reinforcement learning. In NIPS workshop on deep reinforcement learning.
  144. Evaluating large language models on medical evidence summarization. npj Digital Medicine, 6(1):158.
  145. Medagents: Large language models as collaborators for zero-shot medical reasoning. arXiv preprint arXiv:2311.10537.
  146. Building an efficient and effective retrieval-based dialogue system via mutual learning. arXiv preprint arXiv:2110.00159.
  147. Sebastian Thrun and Michael L Littman. 2000. Reinforcement learning: an introduction. AI Magazine, 21(1):103–103.
  148. Clinical camel: An open-source expert-level medical language model with dialogue-based knowledge encoding. arXiv preprint arXiv:2305.12031.
  149. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  150. An overview of the bioasq large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics, 16(1):1–28.
  151. Towards conversational diagnostic ai. arXiv preprint arXiv:2401.05654.
  152. Chatbots and conversational agents in mental health: a review of the psychiatric landscape. The Canadian Journal of Psychiatry, 64(7):456–464.
  153. Mina Valizadeh and Natalie Parde. 2022. The ai doctor is in: A survey of task-oriented dialogue systems for healthcare applications. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6638–6660.
  154. Knowledge graph assisted end-to-end medical dialog generation. Artificial Intelligence in Medicine, 139:102535.
  155. Attention is all you need. Advances in neural information processing systems, 30.
  156. Huatuo: Tuning llama model with chinese medical knowledge. arXiv preprint arXiv:2304.06975.
  157. A survey of the evolution of language model-based dialogue systems. arXiv preprint arXiv:2311.16789.
  158. An efficient method for deidentifying protected health information in chinese electronic health records: Algorithm development and validation. JMIR Medical Informatics, 10(8):e38154.
  159. Xrayglm: The first chinese medical multimodal model that chest radiographs summarization. https://github.com/WangRongsheng/XrayGLM.
  160. Chatcad: Interactive computer-aided diagnosis on medical image using large language models. arXiv preprint arXiv:2302.07257.
  161. Cmb: A comprehensive medical benchmark in chinese. arXiv preprint arXiv:2308.08833.
  162. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  163. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  164. Task-oriented dialogue system for automatic diagnosis. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 201–207.
  165. Promptcblue. https://github.com/michael-wzhu/PromptCBLUE.
  166. A survey of joint intent detection and slot filling models in natural language understanding. ACM Computing Surveys, 55(8):1–38.
  167. Generative adversarial regularized mutual information policy gradient framework for automatic diagnosis. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 1062–1069.
  168. Doctorglm: Fine-tuning your chinese doctor is not a herculean task. arXiv preprint arXiv:2304.01097.
  169. A generate-and-rank framework with semantic type regularization for biomedical concept normalization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8452–8464.
  170. End-to-end knowledge-routed relational dialogue system for automatic diagnosis. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 7346–7353.
  171. A knowledge-driven generative model for multi-implication chinese medical procedure entity normalization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1490–1499.
  172. Deep learning for dialogue systems: Chit-chat and beyond. Foundations and Trends® in Information Retrieval, 15(5):417–589.
  173. Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue. arXiv preprint arXiv:2308.03549.
  174. On the generation of medical dialogues for covid-19. arXiv preprint arXiv:2005.05442.
  175. Writing by memorizing: Hierarchical retrieval-based medical report generation. arXiv preprint arXiv:2106.06471.
  176. Cognitive mirage: A review of hallucinations in large language models. arXiv preprint arXiv:2309.06794.
  177. Qilin-med: Multi-stage knowledge injection advanced medical large language model. arXiv preprint arXiv:2310.09089.
  178. Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5):1160–1179.
  179. Conversational question answering: A survey. Knowledge and Information Systems, 64(12):3151–3195.
  180. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
  181. Csdm: A context-sensitive deep matching model for medical dialogue information extraction. Information Sciences, 607:727–738.
  182. MedDialog: Large-scale medical dialogue datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9241–9250, Online. Association for Computational Linguistics.
  183. Meddialog: Large-scale medical dialogue datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9241–9250.
  184. Joint slot filling and intent detection via capsule neural networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5259–5267, Florence, Italy. Association for Computational Linguistics.
  185. Huatuogpt, towards taming language model to be a doctor. arXiv preprint arXiv:2305.15075.
  186. Exploring collaboration mechanisms for llm agents: A social psychology view. arXiv preprint arXiv:2310.02124.
  187. Cblue: A chinese biomedical language understanding evaluation benchmark. arXiv preprint arXiv:2106.08087.
  188. Shaoting Zhang. 2024. Medbench.
  189. Mitigating language model hallucination with interactive question-knowledge alignment. arXiv preprint arXiv:2305.13669.
  190. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  191. Pulse: Pretrained and unified language service engine.
  192. Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536.
  193. Mie: A medical information extractor towards medical dialogues. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6460–6469.
  194. Yue Zhang and Jie Yang. 2018. Chinese ner using lattice lstm. arXiv preprint arXiv:1805.02023.
  195. Recent advances and challenges in task-oriented dialog systems. Science China Technological Sciences, 63(10):2011–2027.
  196. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129.
  197. Refashioning emotion recognition modelling: The advent of generalised large models. arXiv preprint arXiv:2308.11578.
  198. A neural multi-task learning framework to jointly model medical named entity recognition and normalization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 817–824.
  199. Is chatgpt equipped with emotional dialogue capabilities? arXiv preprint arXiv:2304.09582.
  200. Medical dialogue response generation with pivotal information recalling. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4763–4771.
  201. Pmc-patients: A large-scale dataset of patient summaries and relations for benchmarking retrieval-based clinical decision support systems. arXiv preprint arXiv:2202.13876.
  202. Hierarchical reinforcement learning for automatic disease diagnosis. Bioinformatics, 38(16):3995–4001.
  203. Mtaal: multi-task adversarial active learning for medical named entity recognition and normalization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14586–14593.
  204. An end-to-end progressive multi-task learning framework for medical named entity recognition and normalization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6214–6224.
  205. Detecting hallucinated content in conditional neural sequence generation. arXiv preprint arXiv:2011.02593.
  206. On the generation of medical dialogs for covid-19. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers).
  207. Cnn-rnn based intelligent recommendation for online medical pre-diagnosis support. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(3):912–921.
  208. Knowledge-enhanced interactive matching network for multi-turn response selection in medical dialogue systems. In International Conference on Database Systems for Advanced Applications, pages 255–262. Springer.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Xiaoming Shi (40 papers)
  2. Zeming Liu (34 papers)
  3. Li Du (72 papers)
  4. Yuxuan Wang (239 papers)
  5. Hongru Wang (62 papers)
  6. Yuhang Guo (54 papers)
  7. Tong Ruan (22 papers)
  8. Jie Xu (467 papers)
  9. Shaoting Zhang (133 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets