Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines (2401.11120v2)
Abstract: Background LLMs, enhanced with Clinical Practice Guidelines (CPGs), can significantly improve Clinical Decision Support (CDS). However, methods for incorporating CPGs into LLMs are not well studied. Methods We develop three distinct methods for incorporating CPGs into LLMs: Binary Decision Tree (BDT), Program-Aided Graph Construction (PAGC), and Chain-of-Thought-Few-Shot Prompting (CoT-FSP). To evaluate the effectiveness of the proposed methods, we create a set of synthetic patient descriptions and conduct both automatic and human evaluation of the responses generated by four LLMs: GPT-4, GPT-3.5 Turbo, LLaMA, and PaLM 2. Zero-Shot Prompting (ZSP) was used as the baseline method. We focus on CDS for COVID-19 outpatient treatment as the case study. Results All four LLMs exhibit improved performance when enhanced with CPGs compared to the baseline ZSP. BDT outperformed both CoT-FSP and PAGC in automatic evaluation. All of the proposed methods demonstrated high performance in human evaluation. Conclusion LLMs enhanced with CPGs demonstrate superior performance, as compared to plain LLMs with ZSP, in providing accurate recommendations for COVID-19 outpatient treatment, which also highlights the potential for broader applications beyond the case study.
- “Wordcraft: Story Writing With Large Language Models” In 27th International Conference on Intelligent User Interfaces, IUI ’22 Helsinki, Finland: Association for Computing Machinery, 2022, pp. 841–852 DOI: 10.1145/3490099.3511105
- “Autonomous chemical research with large language models” In Nature 624.7992, 2023, pp. 570–578 DOI: 10.1038/s41586-023-06792-0
- “Chatbot vs Medical Student Performance on Free-Response Clinical Reasoning Examinations” In JAMA Internal Medicine 183.9, 2023, pp. 1028–1030 DOI: 10.1001/jamainternmed.2023.2909
- “ChatGPT Responses to Common Questions about Anterior Cruciate Ligament Reconstruction Are Frequently Satisfactory” In Arthroscopy: The Journal of Arthroscopic & Related Surgery, 2024 DOI: https://doi.org/10.1016/j.arthro.2023.12.009
- “Evaluating large language models on medical evidence summarization” In npj Digital Medicine 6.1, 2023, pp. 158 DOI: 10.1038/s41746-023-00896-7
- “A large language model for electronic health records” In npj Digital Medicine 5.1, 2022, pp. 194 DOI: 10.1038/s41746-022-00742-2
- “Large language models encode clinical knowledge” In Nature 620.7972, 2023, pp. 172–180 DOI: 10.1038/s41586-023-06291-2
- “Language Models are Few-Shot Learners” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 1877–1901 URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
- “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes” In Advances in Neural Information Processing Systems, 2022 URL: https://openreview.net/forum?id=flNZJ2eOet
- “Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing” In ACM Comput. Surv. 55.9 New York, NY, USA: Association for Computing Machinery, 2023 DOI: 10.1145/3560815
- “Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm” In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA ’21 Yokohama, Japan: Association for Computing Machinery, 2021 DOI: 10.1145/3411763.3451760
- “Training language models to follow instructions with human feedback” In Advances in Neural Information Processing Systems, 2022 URL: https://openreview.net/forum?id=TG8KACxEON
- “Self-Consistency Improves Chain of Thought Reasoning in Language Models” In The Eleventh International Conference on Learning Representations, 2023 URL: https://openreview.net/forum?id=1PL1NIMMrw
- “Tree of Thoughts: Deliberate Problem Solving with Large Language Models”, 2023 arXiv:2305.10601 [cs.CL]
- “Chain of Thought Prompting Elicits Reasoning in Large Language Models” In Advances in Neural Information Processing Systems, 2022 URL: https://openreview.net/forum?id=_VjQlMeSB_J
- “The technical landscape for patient-centered CDS: progress, gaps, and challenges” In Journal of the American Medical Informatics Association 29.6, 2022, pp. 1101–1105 DOI: 10.1093/jamia/ocac029
- “An overview of clinical decision support systems: benefits, risks, and strategies for success” In npj Digital Medicine 3.1, 2020, pp. 17 DOI: 10.1038/s41746-020-0221-y
- “A Roadmap for National Action on Clinical Decision Support” In Journal of the American Medical Informatics Association 14.2, 2007, pp. 141–145 DOI: 10.1197/jamia.M2334
- “Challenges and opportunities for advancing patient-centered clinical decision support: findings from a horizon scan” In Journal of the American Medical Informatics Association 29.7, 2022, pp. 1233–1243 DOI: 10.1093/jamia/ocac059
- “A lifecycle framework illustrates eight stages necessary for realizing the benefits of patient-centered clinical decision support ” In Journal of the American Medical Informatics Association 30.9, 2023, pp. 1583–1589 DOI: 10.1093/jamia/ocad122
- OpenAI “Introducing ChatGPT” URL: https://openai.com/blog/chatgpt
- “Using AI-generated suggestions from ChatGPT to optimize clinical decision support” In Journal of the American Medical Informatics Association 30.7, 2023, pp. 1237–1245 DOI: 10.1093/jamia/ocad072
- “Leveraging Large Language Models for Decision Support in Personalized Oncology” In JAMA Network Open 6.11, 2023, pp. e2343689–e2343689 DOI: 10.1001/jamanetworkopen.2023.43689
- “Evaluating the Clinical Decision-Making Ability of Large Language Models Using MKSAP-19 Cardiology Questions” In JACC: Advances 2.9, 2023, pp. 100658 DOI: 10.1016/j.jacadv.2023.100658
- “ChatGPT and large language models in orthopedics: from education and surgery to research” In Journal of Experimental Orthopaedics 10.1, 2023, pp. 128 DOI: 10.1186/s40634-023-00700-1
- Infectious Diseases Society of America “COVID-19 Outpatient Treatment Guidelines Roadmap” Last Updated: February 2, 2023. Accessed: 2023-12-28, https://www.idsociety.org/covid-19-real-time-learning-network/therapeutics-and-interventions/covid-19-outpatient-treatment-guidelines-roadmap/#/+/0/publishedDate_na_dt/desc/
- “PAL: Program-aided Language Models” In Proceedings of the 40th International Conference on Machine Learning 202, Proceedings of Machine Learning Research PMLR, 2023, pp. 10764–10799 URL: https://proceedings.mlr.press/v202/gao23f.html
- “GPT-4 Technical Report”, 2023 arXiv:2303.08774 [cs.CL]
- OpenAI “GPT-3.5” Accessed: 2023-12-28, https://platform.openai.com/docs/models/gpt-3-5
- “LLaMA: Open and Efficient Foundation Language Models”, 2023 arXiv:2302.13971 [cs.CL]
- “PaLM 2 Technical Report”, 2023 arXiv:2305.10403 [cs.CL]
- Tianyu Gao, Adam Fisch and Danqi Chen “Making Pre-trained Language Models Better Few-shot Learners” In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) Online: Association for Computational Linguistics, 2021, pp. 3816–3830 DOI: 10.18653/v1/2021.acl-long.295
- “A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples” In BMC Medical Research Methodology 13.1, 2013, pp. 61 DOI: 10.1186/1471-2288-13-61
- J.Richard Landis and Gary G. Koch “The Measurement of Observer Agreement for Categorical Data” Full publication date: Mar., 1977 In Biometrics 33.1 [Wiley, International Biometric Society], 1977, pp. 159–174 DOI: 10.2307/2529310
- “Adopting and expanding ethical principles for generative artificial intelligence from military to healthcare” In npj Digital Medicine 6.1, 2023, pp. 225 DOI: 10.1038/s41746-023-00965-x
- David Oniani (14 papers)
- Xizhi Wu (5 papers)
- Shyam Visweswaran (21 papers)
- Sumit Kapoor (3 papers)
- Shravan Kooragayalu (1 paper)
- Katelyn Polanska (2 papers)
- Yanshan Wang (50 papers)