A Dataset and Benchmark for Hospital Course Summarization with Adapted Large Language Models (2403.05720v4)
Abstract: Brief hospital course (BHC) summaries are clinical documents that summarize a patient's hospital stay. While LLMs depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as synthesizing BHCs from clinical notes have not been shown. We introduce a novel pre-processed dataset, the MIMIC-IV-BHC, encapsulating clinical note and brief hospital course (BHC) pairs to adapt LLMs for BHC synthesis. Furthermore, we introduce a benchmark of the summarization performance of two general-purpose LLMs and three healthcare-adapted LLMs. Using clinical notes as input, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to three open-source LLMs (Clinical-T5-Large, Llama2-13B, FLAN-UL2) and two proprietary LLMs (GPT-3.5, GPT-4). We evaluate these LLMs across multiple context-length inputs using natural language similarity metrics. We further conduct a clinical study with five clinicians, comparing clinician-written and LLM-generated BHCs across 30 samples, focusing on their potential to enhance clinical decision-making through improved summary quality. We observe that the Llama2-13B fine-tuned LLM outperforms other domain-adapted models given quantitative evaluation metrics of BLEU and BERT-Score. GPT-4 with in-context learning shows more robustness to increasing context lengths of clinical note inputs than fine-tuned Llama2-13B. Despite comparable quantitative metrics, the reader study depicts a significant preference for summaries generated by GPT-4 with in-context learning compared to both Llama2-13B fine-tuned summaries and the original summaries, highlighting the need for qualitative clinical evaluation.
- Asad Aali (8 papers)
- Dave Van Veen (11 papers)
- Yamin Ishraq Arefeen (1 paper)
- Jason Hom (6 papers)
- Christian Bluethgen (20 papers)
- Eduardo Pontes Reis (8 papers)
- Sergios Gatidis (35 papers)
- Namuun Clifford (1 paper)
- Joseph Daws (2 papers)
- Arash S. Tehrani (2 papers)
- Jangwon Kim (4 papers)
- Akshay S. Chaudhari (28 papers)