Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models (2405.16413v1)
Abstract: Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for developing ADRD screening tools such as machine learning based predictive models. Recent advancements in LLMs demonstrate their unprecedented capability of encoding knowledge and performing reasoning, which offers them strong potential for enhancing risk prediction. This paper proposes a novel pipeline that augments risk prediction by leveraging the few-shot inference power of LLMs to make predictions on cases where traditional supervised learning methods (SLs) may not excel. Specifically, we develop a collaborative pipeline that combines SLs and LLMs via a confidence-driven decision-making mechanism, leveraging the strengths of SLs in clear-cut cases and LLMs in more complex scenarios. We evaluate this pipeline using a real-world EHR data warehouse from Oregon Health & Science University (OHSU) Hospital, encompassing EHRs from over 2.5 million patients and more than 20 million patient encounters. Our results show that our proposed approach effectively combines the power of SLs and LLMs, offering significant improvements in predictive performance. This advancement holds promise for revolutionizing ADRD screening and early detection practices, with potential implications for better strategies of patient management and thus improving healthcare.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Robust eeg based biomarkers to detect alzheimer’s disease. Brain Sciences, 11(8):1026, 2021.
- Predicting diagnosis of alzheimer’s disease and related dementias using administrative claims. Journal of managed care & specialty pharmacy, pages 1138–1145, 2018.
- Peter C Austin. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research, 46(3):399–424, 2011.
- Comparative analysis of various machine learning algorithms for detecting dementia. Procedia computer science, 132:1497–1502, 2018.
- Big data and machine learning in health care. Jama, 319(13):1317–1318, 2018.
- Impact of sample selection on in-context learning for entity extraction from scientific writing. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5090–5107, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
- Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4):1–4, 2015.
- Meditron-70b: Scaling medical pretraining for large language models, 2023.
- Together Computer. Redpajama: An open source recipe to reproduce llama training dataset, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Robust cuckoo search enabled fuzzy neuro symbolic reasoning-based alzheimer’s disease prediction at their earlier stages. In Computer Networks and Inventive Communication Technologies: Proceedings of Fifth ICCNCT 2022, pages 871–886. Springer, 2022.
- A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
- Applying artificial intelligence techniques to improve clinical diagnosis of alzheimer’s disease. European Journal of Engineering Science and Technology, 3(2):58–79, 2020.
- Validity of death certificate and hospital discharge icd codes for dementia diagnosis: the multi ethnic study of atherosclerosis. Alzheimer disease and associated disorders, 31(2):168, 2017.
- Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. Journal of the American Medical Informatics Association: JAMIA, 24(1):198, 2017.
- Tablegpt: Few-shot table-to-text generation with table structure reconstruction and content matching. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1978–1988, 2020.
- Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems, 36, 2024.
- Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
- Simon Haykin. Neural networks: a comprehensive foundation. Prentice Hall PTR, 1998.
- A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. arXiv preprint arXiv:2310.05694, 2023.
- Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics, pages 5549–5581. PMLR, 2023.
- The clinical importance of angiography in the diagnosis of periarteritis nodosa. Rontgen-blatter; Zeitschrift fur Rontgen-technik und Medizinisch-wissenschaftliche Photographie, 28(8):339–348, 1975.
- Applied logistic regression, volume 398. John Wiley & Sons, 2013.
- Joseph Hurowitz. Dementia classification through textual analysis with machine learning algorithms. 2022.
- Hypothetical model of dynamic biomarkers of the alzheimer’s pathological cascade. The Lancet Neurology, 9(1):119–128, 2010.
- What disease does this patient have? a large-scale open domain question answering dataset from medical exams, 2020.
- Dense passage retrieval for open-domain question answering. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online, November 2020. Association for Computational Linguistics.
- Screening for early-stage alzheimer’s disease using optimized feature sets and machine learning. Journal of Alzheimer’s Disease, 81(1):355–366, 2021.
- Machine learning for modeling the progression of alzheimer disease dementia using clinical data: a systematic literature review. JAMIA open, 4(3):ooab052, 2021.
- Publicly shareable clinical large language model built on synthetic clinical notes, 2023.
- Early prediction of alzheimer’s disease and related dementias using real-world electronic health records. Alzheimer’s & Dementia, 2023.
- Finding support examples for in-context learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6219–6235, 2023.
- What makes good in-context examples for gpt-3? DeeLIO 2022, page 100, 2022.
- Table-to-text generation by structure-aware seq2seq learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Deid-gpt: Zero-shot medical text de-identification by gpt-4. arXiv preprint arXiv:2303.11032, 2023.
- Mark P Mattson. Pathways towards and away from alzheimer’s disease. Nature, 430(7000):631–639, 2004.
- Multi-method analysis of medical records and mri images for early diagnosis of dementia and alzheimer’s disease based on deep learning and hybrid methods. Electronics, 10(22):2860, 2021.
- Risk score stratification of alzheimer’s disease and mild cognitive impairment using deep learning. medRxiv, pages 2020–11, 2020.
- Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375, 2023.
- Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452, 2023.
- Machine learning models to predict onset of dementia: a label learning approach. Alzheimer’s & Dementia: Translational Research & Clinical Interventions, 5:918–925, 2019.
- Identifying incident dementia by applying machine learning to a very large administrative claims dataset. PLoS One, 14(7):e0203246, 2019.
- Rapid universal early screening for alzheimer’s disease and related dementia via pattern discovery in diagnostic history. Available at SSRN 3920640.
- Machine learning prediction of incidence of alzheimer’s disease using large-scale administrative health data. NPJ digital medicine, 3(1):46, 2020.
- Friedrich Pukelsheim. The three sigma rule. The American Statistician, 48(2):88–91, 1994.
- Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China, November 2019. Association for Computational Linguistics.
- The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.
- Prediction of dementia using smote based oversampling and stacking classifier. In International Conference on Hybrid Intelligent Systems, pages 441–452. Springer, 2022.
- Order-planning neural text generation from structured data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Teams ShareGPT. Sharegpt: Share your wildest chatgpt conversations with one click, 2023.
- Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE journal of biomedical and health informatics, 22(5):1589–1604, 2017.
- Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023.
- Leveraging electronic health records and knowledge networks for alzheimer’s disease prediction and sex-specific biological insights. Nature Aging, pages 1–17, 2024.
- Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6):7, 2023.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Cohort discovery and risk stratification for alzheimer’s disease: an electronic health record-based approach. Alzheimer’s & Dementia: Translational Research & Clinical Interventions, 6(1):e12035, 2020.
- Clinical camel: An open expert-level medical language model with dialogue-based knowledge encoding, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Psf: a unified patient similarity evaluation framework through metric learning with weak supervision. IEEE journal of biomedical and health informatics, 19(3):1053–1060, 2015.
- Symbol tuning improves in-context learning in language models. arXiv preprint arXiv:2305.08298, 2023.
- Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome medicine, 7:1–14, 2015.
- Identifying dementia outcomes in uk biobank: a validation study of primary care, hospital admissions and mortality data. European journal of epidemiology, 34:557–565, 2019.
- Pmc-llama: Further finetuning llama on medical papers. arXiv preprint arXiv:2304.14454, 2023.
- Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Medical care, pages S106–S113, 2010.
- Gatortron: A large clinical language model to unlock patient information from unstructured electronic health records. arXiv preprint arXiv:2203.03540, 2022.
- Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070, 2023.
- Huatuogpt, towards taming language model to be a doctor. arXiv preprint arXiv:2305.15075, 2023.
- Metapred: Meta-learning for clinical risk prediction with limited patient electronic health records. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2487–2495, 2019.
- Investigating table-to-text generation capabilities of large language models in real-world information seeking scenarios. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 160–175, 2023.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR, 2021.
- From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 135–144, 2014.
- Measuring patient similarities via a deep architecture with medical concept embedding. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 749–758. IEEE, 2016.
- Jiankun Wang (61 papers)
- Sumyeong Ahn (13 papers)
- Taykhoom Dalal (1 paper)
- Xiaodan Zhang (26 papers)
- Weishen Pan (14 papers)
- Qiannan Zhang (6 papers)
- Bin Chen (546 papers)
- Hiroko H. Dodge (5 papers)
- Fei Wang (573 papers)
- Jiayu Zhou (70 papers)