Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain (2402.13432v1)

Published 20 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: The biomedical domain has sparked a significant interest in the field of NLP, which has seen substantial advancements with pre-trained LLMs (PLMs). However, comparing these models has proven challenging due to variations in evaluation protocols across different models. A fair solution is to aggregate diverse downstream tasks into a benchmark, allowing for the assessment of intrinsic PLMs qualities from various perspectives. Although still limited to few languages, this initiative has been undertaken in the biomedical field, notably English and Chinese. This limitation hampers the evaluation of the latest French biomedical models, as they are either assessed on a minimal number of tasks with non-standardized protocols or evaluated using general downstream tasks. To bridge this research gap and account for the unique sensitivities of French, we present the first-ever publicly available French biomedical language understanding benchmark called DrBenchmark. It encompasses 20 diversified tasks, including named-entity recognition, part-of-speech tagging, question-answering, semantic textual similarity, and classification. We evaluate 8 state-of-the-art pre-trained masked LLMs (MLMs) on general and biomedical-specific data, as well as English specific MLMs to assess their cross-lingual capabilities. Our experiments reveal that no single model excels across all tasks, while generalist models are sometimes still competitive.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Data-Efficient French Language Modeling with CamemBERTa. In Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL’23), Toronto, Canada.
  2. AliBERT: A pre-trained language model for French biomedical text. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 223–236, Toronto, Canada. Association for Computational Linguistics.
  3. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pages 95–136, virtual+Dublin. Association for Computational Linguistics.
  4. Kaj Bostrom and Greg Durrett. 2020. Byte pair encoding is suboptimal for language model pretraining. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4617–4624, Online. Association for Computational Linguistics.
  5. Generating long sequences with sparse transformers.
  6. Kenneth Ward Church. 2020. Emerging trends: Subwords, seriously? Natural Language Engineering, 26(3):375–382.
  7. Electra: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations.
  8. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  9. Contextualized French language models for biomedical named entity recognition. In Actes de la 6e conférence conjointe Journées d’Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes, pages 36–48, Nancy, France. ATALA et AFCP.
  10. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  11. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare, 3(1):1–23.
  12. DeBERTav3: Improving deBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. In The Eleventh International Conference on Learning Representations.
  13. Superbizarre is not superb: Derivational morphology improves BERT’s interpretation of complex words. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3594–3608, Online. Association for Computational Linguistics.
  14. Opt-iml: Scaling language model instruction meta learning through the lens of generalization.
  15. DrBERT: A robust pre-trained model in French for biomedical and clinical domains. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16207–16221, Toronto, Canada. Association for Computational Linguistics.
  16. FlauBERT: Unsupervised language model pre-training for French. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2479–2490, Marseille, France. European Language Resources Association.
  17. Datasets: A community library for natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 175–184, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  18. Estimating the carbon footprint of bloom, a 176b parameter language model.
  19. CamemBERT: a tasty French language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7203–7219, Online. Association for Computational Linguistics.
  20. The natural language decathlon: Multitask learning as question answering.
  21. Hiroki Nakayama. 2018. seqeval: A python framework for sequence labeling evaluation. Software available from https://github.com/chakki-works/seqeval.
  22. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA.
  23. Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 58–65, Florence, Italy. Association for Computational Linguistics.
  24. Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK.
  25. Camembert-bio: a tasty french language model better for your health.
  26. Llama: Open and efficient foundation language models.
  27. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  28. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
  29. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  30. CBLUE: A Chinese biomedical language understanding evaluation benchmark. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7888–7915, Dublin, Ireland. Association for Computational Linguistics.
  31. Opt: Open pre-trained transformer language models.
  32. 2019. International statistical classification of diseases and related health problems 10th revision. World Health Organization.
  33. Présentation de la campagne d’évaluation DEFT 2020 : similarité textuelle en domaine ouvert et extraction d’information précise dans des cas cliniques (presentation of the DEFT 2020 challenge : open domain textual similarity and precise information extraction from clinical cases ). In Actes de la 6e conférence conjointe Journées d’Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes, pages 1–13, Nancy, France. ATALA et AFCP.
  34. Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora. Natural Language Engineering, 27(2):181–201.
  35. Bigbio: A framework for data-centric biomedical natural language processing. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  36. CAS: French Corpus with Clinical Cases. In Proceedings of the 9th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 1–7, Brussels, Belgium.
  37. Classification de cas cliniques et évaluation automatique de réponses d’étudiants : présentation de la campagne DEFT 2021 (clinical cases classification and automatic evaluation of student answers : Presentation of the DEFT 2021 challenge). In Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Atelier DÉfi Fouille de Textes (DEFT), pages 1–13, Lille, France. ATALA.
  38. CLISTER : A corpus for semantic textual similarity in French clinical narratives. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4306–4315, Marseille, France. European Language Resources Association.
  39. A spoken drug prescription dataset in french for spoken language understanding. In 13th Language Resources and Evaluation Conference (LREC 2022).
  40. A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC. Journal of the American Medical Informatics Association, 22(5):948–956.
  41. FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), Abou Dhabi, United Arab Emirates.
  42. MORFITT : A multi-label corpus of French scientific articles in the biomedical domain. In 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN) Atelier sur l’Analyse et la Recherche de Textes Scientifiques, Paris, France. Florian Boudin.
  43. The Unified Medical Language System. Methods Inf Med, 32(4):281–291.
  44. The e3c project: Collection and annotation of a multilingual corpus of clinical cases. Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020.
  45. The QUAERO French medical corpus: A ressource for medical entity recognition and normalization. In Proc of BioTextMining Work, pages 24–30.
  46. Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures. Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019, pages 9 – 16, Mannheim. Leibniz-Institut f"ur Deutsche Sprache.
  47. CCNet: Extracting high quality monolingual datasets from web crawl data. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4003–4012, Marseille, France. European Language Resources Association.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Yanis Labrak (12 papers)
  2. Adrien Bazoge (6 papers)
  3. Oumaima El Khettari (3 papers)
  4. Mickael Rouvier (25 papers)
  5. Pacome Constant dit Beaufils (1 paper)
  6. Natalia Grabar (2 papers)
  7. Beatrice Daille (15 papers)
  8. Solen Quiniou (4 papers)
  9. Emmanuel Morin (13 papers)
  10. Pierre-Antoine Gourraud (5 papers)
  11. Richard Dufour (33 papers)
Citations (4)