Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Lingual Knowledge Transfer for Clinical Phenotyping (2208.01912v1)

Published 3 Aug 2022 in cs.CL

Abstract: Clinical phenotyping enables the automatic extraction of clinical conditions from patient records, which can be beneficial to doctors and clinics worldwide. However, current state-of-the-art models are mostly applicable to clinical notes written in English. We therefore investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language and have a small amount of in-domain data available. We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains such as cardiology, oncology and the ICU. Our results reveal two strategies that outperform the state-of-the-art: Translation-based methods in combination with domain-specific encoders and cross-lingual encoders plus adapters. We find that these strategies perform especially well for classifying rare phenotypes and we advise on which method to prefer in which situation. Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jens-Michalis Papaioannou (7 papers)
  2. Paul Grundmann (5 papers)
  3. Betty van Aken (10 papers)
  4. Athanasios Samaras (1 paper)
  5. Ilias Kyparissidis (1 paper)
  6. George Giannakoulas (1 paper)
  7. Felix Gers (1 paper)
  8. Alexander Löser (21 papers)
Citations (6)