A Benchmark Evaluation of Clinical Named Entity Recognition in French (2403.19726v1)

Published 28 Mar 2024 in cs.CL, cs.AI, and q-bio.QM

Abstract: Background: Transformer-based LLMs have shown strong performance on many Natural LanguageProcessing (NLP) tasks. Masked LLMs (MLMs) attract sustained interest because they can be adaptedto different languages and sub-domains through training or fine-tuning on specific corpora while remaining lighterthan modern LLMs. Recently, several MLMs have been released for the biomedicaldomain in French, and experiments suggest that they outperform standard French counterparts. However, nosystematic evaluation comparing all models on the same corpora is available. Objective: This paper presentsan evaluation of masked LLMs for biomedical French on the task of clinical named entity recognition.Material and methods: We evaluate biomedical models CamemBERT-bio and DrBERT and compare them tostandard French models CamemBERT, FlauBERT and FrALBERT as well as multilingual mBERT using three publicallyavailable corpora for clinical named entity recognition in French. The evaluation set-up relies on gold-standardcorpora as released by the corpus developers. Results: Results suggest that CamemBERT-bio outperformsDrBERT consistently while FlauBERT offers competitive performance and FrAlBERT achieves the lowest carbonfootprint. Conclusion: This is the first benchmark evaluation of biomedical masked LLMs for Frenchclinical entity recognition that compares model performance consistently on nested entity recognition using metricscovering performance and environmental impact.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (34)

Authors (4)

Nesrine Bannour (1 paper)
Christophe Servan (16 papers)
Aurélie Névéol (10 papers)
Xavier Tannier (16 papers)

Tweets

https://twitter.com/gastronomy/status/1774650141208674445

A Benchmark Evaluation of Clinical Named Entity Recognition in French (2403.19726v1)

Related Papers

Tweets