BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation (2403.19414v1)
Abstract: Medical dialogue generation (MDG) has gained increasing attention due to its substantial practical value. Previous works typically employ a sequence-to-sequence framework to generate medical responses by modeling dialogue context as sequential text with annotated medical entities. While these methods have been successful in generating fluent responses, they fail to provide process explanations of reasoning and require extensive entity annotation. To address these limitations, we propose the method Bootstrap Prompting for Explicit Reasoning in MDG (BP4ER), which explicitly model MDG's multi-step reasoning process and iteratively enhance this reasoning process. We employ a least-to-most prompting strategy to guide a LLM in explicit reasoning, breaking down MDG into simpler sub-questions. These sub-questions build on answers from previous ones. Additionally, we also introduce two distinct bootstrapping techniques for prompting, which autonomously correct errors and facilitate the LLM's explicit reasoning. This approach eliminates the need for entity annotation and increases the transparency of the MDG process by explicitly generating the intermediate reasoning chain. The experimental findings on the two public datasets indicate that BP4ER outperforms state-of-the-art methods in terms of both objective and subjective evaluation metrics.
- Language models are few-shot learners. Proceedings of NIPS, 33:1877–1901.
- Mdiaformer: Automatic diagnosis via symptoms sequence generation. In Proceedings of AAAI, pages 4432–4440.
- A survey on in-context learning.
- Extracting symptoms and their status from clinical conversationsextracting symptoms and their status from clinical conversations. In Proceedings of ACL, page 915–925.
- Learning to infer entities, properties and their relations from clinical conversations. In Proceedings of EMNLP-IJCNLP.
- Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of ACL, pages 320–335.
- Cardiac: An intelligent conversational assistant for chronic heart failure patient heath monitoring. In AAAI Fall Symposium: Virtual Healthcare Interaction.
- Dialmed: A dataset for dialogue-based medication recommendation. In Proceedings of COLING, page 721–733.
- Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403.
- Chatgpt makes medicine easy to swallow: An exploratory case study on simplified radiology reports.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Benefits, limits, and risks of gpt-4 as an ai chatbot for medicine. New England Journal of Medicine, 388(13):1233–1239.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
- Solving quantitative reasoning problems with language models. Proceedings of NIPS, 35:3843–3857.
- Semi-supervised variational reasoning for medical dialogue generation. In Proceedings of SIGIR, pages 11–15.
- Semi-Supervised Variational Reasoning for Medical Dialogue Generation. The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. PID https://github.com/lddsdu/VRBot.
- A diversity-promoting objective function for neural conversation models. In Proceedings of NAACL, pages 110–119.
- Chinyew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Graph-evolving meta-learning for low-resource medical dialogue generation. In Proceedings of AAAI.
- Enhancing dialogue symptom diagnosis with global attention and symptom graph. In Proceedings of EMNLP-IJCNLP, page 5033–5042.
- How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of EMNLP.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Proceedings of NIPS, 35:1950–1965.
- Meddg: an entity-centric medical consultation dataset for entity-aware medical dialogue generation. In CCF International Conference on Natural Language Processing and Chinese Computing, pages 447–459. Springer.
- MedDG: an entity-centric medical consultation dataset for entity-aware medical dialogue generation. CCF International Conference on Natural Language Processing and Chinese Computing. PID https://github.com/lwgkzl/MedDG.
- Heterogeneous graph reasoning for knowledge-grounded medical dialogue system. In Neurocomputing, page 260–268.
- P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of ACL, pages 61–68.
- Show your work: Scratchpads for intermediate computation with language models. In arXiv:2112.00114.
- Physicians’ perceptions of chatbots in health care: cross-sectional web-based survey. Journal of medical Internet research, 21(4):e12887.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of COLING, pages 311–318.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Semantic cosine similarity. In Proceedings of ICAST, volume 4, page 1.
- Explain yourself! leveraging language models for commonsense reasoning. In Proceedings of ACL.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of AAAI, volume 30.
- Understanding medical conversations with scattered keyword attention and weak supervision from responses. In Proceedings of AAAI, page 8838–8845.
- Automatic prompt augmentation and selection with chain-of-thought from labeled data. arXiv preprint arXiv:2302.12822.
- Large language models encode clinical knowledge. Nature, pages 1–9.
- Keith E Stanovich and Richard F West. 2000. 24. individual differences in reasoning: Implications for the rationality debate? Behavioural and Brain Science, 23(5):665–726.
- Sequence to sequence learning with neural networks. Proceedings of NIPS, 27.
- Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6):7.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Iteratively prompt pre-trained language models for chain of thought. arXiv preprint arXiv:2203.08383.
- What language model architecture and pretraining objective works best for zero-shot generalization? In Proceedings of ICML, pages 22964–22984. PMLR.
- A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–37.
- Self-consistency improves chain of thought reasoning in language models.
- Self-instruct: Aligning language model with self generated instructions. In Proceedings of ACL.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Chain of thought prompting elicits reasoning in large language models. In Proceedings of NIPS.
- Chain-of-thought prompting elicits reasoning in large language models. Proceedings of NIPS, 35:24824–24837.
- Task-oriented dialogue system for automatic diagnosis. In Proceedings of ACL, pages 201–207.
- Health conversational system based on contextual matching of community-driven question-answer pairs. In Proceedings of CIKM, pages 2577–2580.
- Lingyi: Medical conversational question answering system based on multi-modal knowledge graphs. arXiv e-prints, pages arXiv–2204.
- Medconqa: Medical conversational question answering system based on knowledge graphs. In Proceedings of EMNLP, pages 148–158.
- Generative adversarial regularized mutual information policy gradient framework for automatic diagnosis. In Proceedings of AAAI, page 1062–1069.
- Medical dialogue generation via dual flow modeling. In ACL, pages 33–40.
- End-to-end knowledge-routed relational dialogue system for automatic diagnosis. In Proceedings of AAAI, page 7346–7353.
- End-to-end knowledge-routed relational dialogue system for automatic diagnosis. In Proceedings of AAAI, volume 33, pages 7346–7353.
- Remedi: Resources for multi-domain, multi-service, medical dialogues. In Proceedings of SIGIR, pages 3013–3024.
- Star: Self-taught reasoner bootstrapping reasoning with reasoning. In Proceedings of NIPS.
- Meddialog: Large-scale medical dialogue datasets. In Proceedings of EMNLP, page 9241–9250.
- Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536.
- Mie: A medical information extractor towards medical dialogues. In Proceedings of ACL, page 6460–6469.
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Medical dialogue response generation with pivotal information recalling. In Proceedings of KDD, pages 14–18.
- Least-to-most prompt enables complex reasoning in large language models. In Proceedings of ICLR.
- On the generation of medical dialogs for covid-19. In Proceedings of ACL-IJCNLP.
- Learning to describe for predicting zero-shot drug-drug interactions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14855–14870.
- Yuhong He (9 papers)
- Yongqi Zhang (33 papers)
- Shizhu He (51 papers)
- Jun Wan (79 papers)