OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets (2508.01630v1)

Published 3 Aug 2025 in cs.CL and cs.AI

Abstract: Named-entity recognition (NER) is fundamental to extracting structured information from the >80% of healthcare data that resides in unstructured clinical notes and biomedical literature. Despite recent advances with LLMs, achieving state-of-the-art performance across diverse entity types while maintaining computational efficiency remains a significant challenge. We introduce OpenMed NER, a suite of open-source, domain-adapted transformer models that combine lightweight domain-adaptive pre-training (DAPT) with parameter-efficient Low-Rank Adaptation (LoRA). Our approach performs cost-effective DAPT on a 350k-passage corpus compiled from ethically sourced, publicly available research repositories and de-identified clinical notes (PubMed, arXiv, and MIMIC-III) using DeBERTa-v3, PubMedBERT, and BioELECTRA backbones. This is followed by task-specific fine-tuning with LoRA, which updates less than 1.5% of model parameters. We evaluate our models on 12 established biomedical NER benchmarks spanning chemicals, diseases, genes, and species. OpenMed NER achieves new state-of-the-art micro-F1 scores on 10 of these 12 datasets, with substantial gains across diverse entity types. Our models advance the state-of-the-art on foundational disease and chemical benchmarks (e.g., BC5CDR-Disease, +2.70 pp), while delivering even larger improvements of over 5.3 and 9.7 percentage points on more specialized gene and clinical cell line corpora. This work demonstrates that strategically adapted open-source models can surpass closed-source solutions. This performance is achieved with remarkable efficiency: training completes in under 12 hours on a single GPU with a low carbon footprint (< 1.2 kg CO2e), producing permissively licensed, open-source checkpoints designed to help practitioners facilitate compliance with emerging data protection and AI regulations, such as the EU AI Act.

Summary

The paper presents a lightweight DAPT+LoRA pipeline that achieves state-of-the-art BioNER performance on 10 of 12 public datasets.
It demonstrates a parameter-efficient adaptation of transformer models, reducing computational overhead and carbon footprint.
The approach offers open-source accessibility, regulatory compliance, and a democratized platform for advancing clinical NLP research.

OpenMed NER: Domain-Adapted, Parameter-Efficient Transformers for Biomedical Named Entity Recognition

Introduction

OpenMed NER presents a suite of open-source, domain-adapted transformer models for biomedical named entity recognition (BioNER), targeting the extraction of structured information from unstructured clinical and biomedical text. The framework combines lightweight domain-adaptive pre-training (DAPT) with parameter-efficient Low-Rank Adaptation (LoRA) on strong transformer backbones (DeBERTa-v3, PubMedBERT, BioELECTRA). The models are evaluated across 12 public BioNER datasets, achieving new state-of-the-art (SOTA) results on 10, with significant improvements on both foundational and specialized corpora. The approach emphasizes computational efficiency, regulatory compliance, and open accessibility.

Methodology

Domain-Adaptive Pre-training with LoRA

OpenMed NER employs a three-stage pipeline:

DAPT with LoRA: Instead of full-model adaptation, LoRA adapters (rank=16, α=32, dropout=0.05) are inserted into the attention layers of the backbone models. DAPT is performed on a 350k-passage corpus (PubMed, arXiv, MIMIC-III, ClinicalTrials.gov), using masked language modeling (MLM). This process is highly efficient, requiring ~4 hours on a single NVIDIA A100-80GB GPU, and results in a 18–22.5% perplexity reduction on held-out biomedical text.
Task-Specific Fine-Tuning: For each BioNER dataset, only the LoRA adapters and a new token-classification head are updated; the backbone remains frozen. Fine-tuning is rapid (3–6 minutes per dataset) and memory-efficient (<16GB VRAM), with early stopping based on dev set F1.
Bayesian Hyper-parameter Optimization: A 40-trial Bayesian search (Optuna TPE, Ray Tune) is used to optimize learning rate, LoRA rank, and dropout, yielding robust configurations across datasets.

Model Selection and Architecture

Three backbones are used: DeBERTa-v3-large, PubMedBERT-large, and BioELECTRA-large. For each dataset, the best-performing backbone is selected. The token-classification head is a single linear layer mapping the final hidden state to BIO label logits. LoRA adapters update ~1.4% of parameters, enabling modular, efficient deployment.

Training and Evaluation Protocol

Data: 12 public BioNER datasets spanning chemicals, diseases, genes/proteins, species, anatomy, and clinical cell lines.
Metrics: Entity-level micro-F1 (exact match), with statistical significance assessed via approximate randomization.
Baselines: Comparisons include both open-source (BioBERT, PubMedBERT, KeBioLM, SciFive, DBGN, BioNerFlair, ConNER) and closed-source/commercial systems (BioMegatron, Spark NLP, BERN2).
Reproducibility: Fixed random seeds and deterministic LoRA fine-tuning ensure negligible run-to-run variance (<0.1 F1).

Results

OpenMed NER achieves new SOTA micro-F1 on 10 of 12 datasets, with the following highlights:

Foundational Benchmarks: +2.70 pp on BC5CDR-Disease, +1.88 pp on BC5CDR-Chem, +1.37 pp on BC4CHEMD, +1.39 pp on NCBI-Disease.
Specialized Corpora: +5.39 pp on BC2GM (gene/protein), +9.72 pp on CLL (clinical cell line), +3.80 pp on Linnaeus (species).
Efficiency: All training (DAPT + fine-tuning) completes in <12 hours on a single GPU, with a carbon footprint <1.2 kg CO₂e per model.
Open Source: All checkpoints are released under Apache 2.0, supporting local, on-premise deployment and regulatory compliance.

On JNLPBA and AnatEM, OpenMed NER is marginally below SOTA (–0.10 pp and –1.05 pp, respectively), with error analysis attributing this to domain shift (archaic nomenclature) and boundary detection challenges.

Discussion

Architectural and Empirical Insights

Backbone Selection: DeBERTa-v3 excels on datasets with long, compositional entities due to disentangled attention. PubMedBERT's specialized vocabulary benefits biomedical morphology, while BioELECTRA's RTD objective is advantageous for token-level tasks.
DAPT+LoRA Synergy: Ablation studies show a 2–4% absolute F1 gain over using DAPT or LoRA in isolation, confirming their complementary effects.
Adapter Modularity: LoRA enables rapid, auditable model updates—critical for clinical MLOps and regulatory traceability.

Limitations

Nested/Discontinuous Entities: The BIO tagging scheme cannot represent overlapping/nested entities; span-based or pointer architectures are needed for full coverage.
Domain and Language Coverage: Performance on noisy clinical notes and non-English corpora remains an open challenge.
Entity Normalization: Current models perform recognition only; integration with entity linking (e.g., UMLS, MeSH) is a priority for downstream clinical utility.

Regulatory and Environmental Considerations

EU AI Act Compliance: OpenMed NER's open-source, on-premise deployability facilitates compliance with high-risk AI system requirements (risk management, data governance, human oversight).
Sustainability: Parameter-efficient adaptation yields a low carbon footprint, making large-scale biomedical NLP accessible to resource-constrained institutions.

Implications and Future Directions

OpenMed NER demonstrates that strategic, parameter-efficient adaptation of strong open-source backbones can consistently outperform proprietary and resource-intensive systems in biomedical NER. This has direct implications for democratizing access to high-performance clinical NLP, supporting regulatory compliance, and reducing environmental impact.

Future research should address nested entity recognition, expand to multilingual and low-resource clinical domains, and integrate entity normalization. The modularity of LoRA adapters also opens avenues for continual learning, federated adaptation, and rapid response to emerging biomedical terminology.

Conclusion

OpenMed NER establishes a new standard for open, efficient, and high-performing biomedical NER. By combining DAPT and LoRA on strong transformer backbones, it achieves SOTA results across a broad spectrum of biomedical tasks with minimal computational resources. The open release of models and code provides the community with practical tools for research and clinical deployment, and sets a foundation for further advances in domain-adapted, parameter-efficient NLP for healthcare.