Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CPLLM: Clinical Prediction with Large Language Models (2309.11295v2)

Published 20 Sep 2023 in cs.CL, cs.AI, and cs.LG

Abstract: We present Clinical Prediction with LLMs (CPLLM), a method that involves fine-tuning a pre-trained LLM for clinical disease and readmission prediction. We utilized quantization and fine-tuned the LLM using prompts. For diagnosis prediction, we predict whether patients will be diagnosed with a target disease during their next visit or in the subsequent diagnosis, leveraging their historical diagnosis records. We compared our results to various baselines, including RETAIN, and Med-BERT, the current state-of-the-art model for disease prediction using temporal structured EHR data. In addition, We also evaluated CPLLM for patient hospital readmission prediction and compared our method's performance with benchmark baselines. Our experiments have shown that our proposed method, CPLLM, surpasses all the tested models in terms of PR-AUC and ROC-AUC metrics, showing state-of-the-art results for diagnosis prediction and patient hospital readmission prediction. Such a method can be easily implemented and integrated into the clinical process to help care providers estimate the next steps of patients

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
  2. Large language models are few-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  1998–2022, 2022.
  3. Falcon-40b: an open large language model with state-of-the-art performance. Findings of the Association for Computational Linguistics: ACL, 2023:10755–10773, 2023.
  4. Leo Breiman. Random forests. Machine learning, 45:5–32, 2001.
  5. Boosting transformers and language models for clinical prediction in immunotherapy. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pp.  332–340, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-industry.32. URL https://aclanthology.org/2023.acl-industry.32.
  6. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems, 29, 2016.
  7. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pp.  233–240, 2006.
  8. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  10. Anne Elixhauser. Clinical classifications software (ccs) 2009. http://www. hcug-us. ahrq. gov/toolssoft-ware/ccs/ccs. jsp, 2009.
  11. Clinical classifications software (ccs). US agency for healthcare research and quality, 2014, 2014.
  12. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
  13. A survey on text classification algorithms: From text to predictions. Information, 13(2):83, 2022.
  14. Patient event sequences for predicting hospitalization length of stay. In International Conference on Artificial Intelligence in Medicine, pp.  51–56. Springer, 2023.
  15. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654, 2020.
  16. Applied logistic regression, volume 398. John Wiley & Sons, 2013.
  17. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp.  2790–2799. PMLR, 2019.
  18. Health system-scale language models are all-purpose prediction engines. Nature, pp.  1–6, 2023.
  19. Mimic-iv. PhysioNet. Available online at: https://physionet. org/content/mimiciv/1.0/(accessed August 23, 2021), 2020.
  20. Clinical prediction rules: a review and suggested modifications of methodological standards. Jama, 277(6):488–494, 1997.
  21. Behrt: transformer for electronic health records. Scientific reports, 10(1):7155, 2020.
  22. Hi-behrt: Hierarchical transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records. IEEE journal of biomedical and health informatics, 27(2):1106–1117, 2022a.
  23. Clinical-longformer and clinical-bigbird: Transformers for long clinical sequences. arXiv preprint arXiv:2201.11838, 2022b.
  24. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  25. Clinicalt5: A generative language model for clinical text. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp.  5436–5443, 2022.
  26. Concare: Personalized clinical feature embedding via capturing the healthcare context. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  833–840, 2020.
  27. Ready or not! here comes icd-10. Journal of neurointerventional surgery, 5(1):86–91, 2013.
  28. Health data analytics using scalable logistic regression with stochastic gradient descent. International Journal of Advanced Intelligence Paradigms, 10(1-2):118–132, 2018.
  29. Bidirectional representation learning from transformers using multimodal electronic health record data to predict depression. IEEE Journal of Biomedical and Health Informatics, 25(8):3121–3129, 2021.
  30. Augmented language models: a survey. ArXiv, 2023.
  31. Anatomical therapeutic chemical classification system (atc). Dictionary of Pharmaceutical Medicine, pp.  8–8, 2009.
  32. Deepr: a convolutional net for medical records. IEEE journal of biomedical and health informatics, 21(1):22–30, 2016.
  33. OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  34. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  35. The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data, 5(1):1–13, 2018.
  36. Lutz Prechelt. Early stopping-but when? In Neural Networks: Tricks of the trade, pp.  55–69. Springer, 2002.
  37. Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ digital medicine, 4(1):86, 2021.
  38. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  39. Federated learning of medical concepts embedding using behrt. arXiv preprint arXiv:2305.13052, 2023.
  40. Large language models encode clinical knowledge. Nature, pp.  1–9, 2023.
  41. Healthprompt: A zero-shot learning paradigm for clinical natural language processing. In AMIA Annual Symposium Proceedings, volume 2022, pp.  972. American Medical Informatics Association, 2022.
  42. Language models are an effective representation learning technique for electronic health record data. Journal of biomedical informatics, 113:103637, 2021.
  43. Text classification via large language models. arXiv preprint arXiv:2305.08377, 2023.
  44. Large language models in medicine. Nature medicine, pp.  1–11, 2023.
  45. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  46. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  47. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  48. Biomedlm: a domain-specific large language model for biomedical text. MosaicML. Accessed: Dec, 23(3):2, 2022.
  49. Clinical prediction rules: applications and methodological standards. New England Journal of Medicine, 313(13):793–799, 1985.
  50. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
  51. PyHealth: A deep learning toolkit for healthcare predictive modeling. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2023, 2023a. URL https://github.com/sunlabuiuc/PyHealth.
  52. Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712, 2023b.
  53. A large language model for electronic health records. NPJ Digital Medicine, 5(1):194, 2022.
  54. Almanac—retrieval-augmented language models for clinical medicine. NEJM AI, 1(2):AIoa2300068, 2024.
  55. Grasp: generic framework for health status representation learning based on incorporating knowledge from similar patients. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp.  715–723, 2021.
  56. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ofir Ben Shoham (4 papers)
  2. Nadav Rappoport (8 papers)
Citations (20)

Summary

Insights into CPLLM: Clinical Prediction with LLMs

The paper "CPLLM: Clinical Prediction with LLMs" addresses the integration of LLMs into clinical prediction tasks, focusing on disease and patient hospital readmission prediction. The authors propose a novel approach that fine-tunes pre-trained LLMs, using quantization and prompt-based training, to model sequential electronic health record (EHR) data.

Methodological Advancements

At the core of this research is the CPLLM framework, which leverages LLMs to predict clinical events by encoding patient histories as text sequences. The method applies to multiple tasks, including predicting a patient's next diagnosis or their likelihood of hospital readmission within a set timeframe. Notably, the fine-tuning of these models does not require pre-training tasks specific to the clinical domain, distinguishing it significantly from existing models like Med-BERT.

The researchers used two specific LLMs: Llama2 and BioMedLM, augmenting them with QLoRA (a parameter-efficient fine-tuning technique) to enhance their applicability to clinical tasks. This approach, interestingly, does not necessitate information such as the length of stay (LOS) or the sequence of specific visitations, which are often challenging to obtain and are integral to models like Med-BERT.

Data and Evaluation

The performance of CPLLM was benchmarked against existing state-of-the-art methods using two well-known datasets: MIMIC-IV and eICU-CRD. These datasets are instrumental for modeling both individual hospital data streams and multi-center data streams, covering a wide range of ICD-9 and ICD-10 coded medical conditions. The evaluation metrics included PR-AUC and ROC-AUC, with CPLLM consistently surpassing other models across multiple tasks.

For instance, in predicting acute and unspecified renal failure, CPLLM showed a significant improvement by achieving a PR-AUC score of 45.442%, markedly superior to the baseline models. Similarly, for hospital readmission prediction tasks, CPLLM outperformed competitive models by a notable margin when assessed on both datasets.

Implications and Future Directions

CPLLM presents a versatile framework capable of integrating into existing healthcare systems to provide enhanced predictive insights that could improve patient management strategies. Its ability to handle long-term sequences, with token limits that far exceed those of BERT-based models, reflects its potential to process extensive EHR data without the extensive preprocessing steps that other models require.

The practical implications include its adaptability to scenarios where detailed LOS data is unavailable, or where quick deployment is necessary in settings that prioritize real-time analytics over model training.

Despite its strengths, CPLLM does require substantial computing resources for fine-tuning LLMs, which may present a barrier in resource-constrained environments. Additionally, the customization of prompts for different tasks raises questions about the general applicability of predefined prompts across diverse datasets.

Conclusion

In summary, "CPLLM: Clinical Prediction with LLMs" showcases an innovative application of LLMs in the medical field, offering a method that is not only robust and highly flexible but also performs well against traditional clinical prediction models. Future work would benefit from exploring the integration of domain-specific retrieval augmentation to capitalize on the latest advancements in LLMs and further enhance prediction capabilities. This research sets a new benchmark for LLMs in clinical applications, marking a significant step forward in using these models beyond conventional natural language processing tasks.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com