Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparative Analysis of Text Classification Approaches in Electronic Health Records (2005.06624v1)

Published 8 May 2020 in cs.CL and cs.LG

Abstract: Text classification tasks which aim at harvesting and/or organizing information from electronic health records are pivotal to support clinical and translational research. However these present specific challenges compared to other classification tasks, notably due to the particular nature of the medical lexicon and language used in clinical records. Recent advances in embedding methods have shown promising results for several clinical tasks, yet there is no exhaustive comparison of such approaches with other commonly used word representations and classification models. In this work, we analyse the impact of various word representations, text pre-processing and classification algorithms on the performance of four different text classification tasks. The results show that traditional approaches, when tailored to the specific language and structure of the text inherent to the classification task, can achieve or exceed the performance of more recent ones based on contextual embeddings such as BERT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Aurelie Mascio (4 papers)
  2. Zeljko Kraljevic (11 papers)
  3. Daniel Bean (8 papers)
  4. Richard Dobson (22 papers)
  5. Robert Stewart (19 papers)
  6. Rebecca Bendayan (7 papers)
  7. Angus Roberts (13 papers)
Citations (43)