Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain (2112.08754v3)

Published 16 Dec 2021 in cs.CL and cs.LG

Abstract: The field of NLP has recently seen a large change towards using pre-trained LLMs for solving almost any task. Despite showing great improvements in benchmark datasets for various tasks, these models often perform sub-optimal in non-standard domains like the clinical domain where a large gap between pre-training documents and target documents is observed. In this paper, we aim at closing this gap with domain-specific training of the LLM and we investigate its effect on a diverse set of downstream tasks and settings. We introduce the pre-trained CLIN-X (Clinical XLM-R) LLMs and show how CLIN-X outperforms other pre-trained transformer models by a large margin for ten clinical concept extraction tasks from two languages. In addition, we demonstrate how the transformer model can be further improved with our proposed task- and language-agnostic model architecture based on ensembles over random splits and cross-sentence context. Our studies in low-resource and transfer settings reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1 points when only 250 labeled sentences are available. Our results highlight the importance of specialized LLMs as CLIN-X for concept extraction in non-standard domains, but also show that our task-agnostic model architecture is robust across the tested tasks and languages so that domain- or task-specific adaptations are not required.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Lukas Lange (31 papers)
  2. Heike Adel (51 papers)
  3. Jannik Strötgen (23 papers)
  4. Dietrich Klakow (114 papers)
Citations (14)