Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling (2211.07713v1)

Published 25 Oct 2022 in cs.CL and cs.AI

Abstract: Large pre-trained LLMs (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA. However, the applicability of these methods to real clinical use cases is hindered, due to the limitation of pre-trained LMs in processing long textual data with thousands of words, which is a common length for a clinical note. In this work, we explore long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record (EHR) system to show the effectiveness and generalizability of this concept, achieving 10\% F1-score improvement. Based on our experiments, we conclude that capturing a longer clinical note interval is beneficial to the model performance, but there are different cut-off intervals to achieve the optimal performance for different target variables. Our code is available at https://github.com/HLTCHKUST/long-biomedical-model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Samuel Cahyawijaya (75 papers)
  2. Bryan Wilie (24 papers)
  3. Holy Lovenia (30 papers)
  4. Huan Zhong (3 papers)
  5. MingQian Zhong (1 paper)
  6. Yuk-Yu Nancy Ip (1 paper)
  7. Pascale Fung (150 papers)
Citations (2)