Efficient extraction of medication information from clinical notes: an evaluation in two languages (2502.03257v1)

Published 5 Feb 2025 in cs.CL and cs.IR

Abstract: Objective: To evaluate the accuracy, computational cost and portability of a new NLP method for extracting medication information from clinical narratives. Materials and Methods: We propose an original transformer-based architecture for the extraction of entities and their relations pertaining to patients' medication regimen. First, we used this approach to train and evaluate a model on French clinical notes, using a newly annotated corpus from H^opitaux Universitaires de Strasbourg. Second, the portability of the approach was assessed by conducting an evaluation on clinical documents in English from the 2018 n2c2 shared task. Information extraction accuracy and computational cost were assessed by comparison with an available method using transformers. Results: The proposed architecture achieves on the task of relation extraction itself performance that are competitive with the state-of-the-art on both French and English (F-measures 0.82 and 0.96 vs 0.81 and 0.95), but reduce the computational cost by 10. End-to-end (Named Entity recognition and Relation Extraction) F1 performance is 0.69 and 0.82 for French and English corpus. Discussion: While an existing system developed for English notes was deployed in a French hospital setting with reasonable effort, we found that an alternative architecture offered end-to-end drug information extraction with comparable extraction performance and lower computational impact for both French and English clinical text processing, respectively. Conclusion: The proposed architecture can be used to extract medication information from clinical text with high performance and low computational cost and consequently suits with usually limited hospital IT resources

Authors (9)

Thibaut Fabacher (2 papers)
Erik-André Sauleau (2 papers)
Emmanuelle Arcay (1 paper)
Bineta Faye (1 paper)
Maxime Alter (1 paper)
Archia Chahard (1 paper)
Nathan Miraillet (1 paper)
Adrien Coulet (15 papers)
Aurélie Névéol (10 papers)

Summary

Here's a detailed summary of the paper you provided:

Title: Efficient extraction of medication information from clinical notes: an evaluation in two languages

Objective: The paper aims to evaluate the accuracy, computational cost, and portability of a new NLP method designed for extracting medication information from clinical narratives. The core goal is to develop a method suitable for resource-constrained hospital environments.

Methods:

Architecture: The authors propose a novel transformer-based architecture for extracting entities and their relationships related to patients' medication regimens. This architecture is designed to classify all relationships simultaneously to reduce the computational burden. It involves tokenizing input sentences, generating contextual embeddings using a transformer, and using a dedicated embedding layer for token labels. These representations are combined and processed through multi-head self-attention and a deep neural network to classify relationships between token pairs.
Data:
- French Corpus (Corp-HUS): A newly annotated corpus of 715 French clinical notes (discharge summaries, paramedical and admission notes) from Strasbourg University Hospital. The notes are from patients diagnosed with probable rheumatoid arthritis.
- English Corpus (n2c2 2018): The 2018 n2c2 shared task dataset, which contains discharge notes annotated for entities like Drug, Strength, Form, Dosage, Frequency, Route, Duration, Reason, and ADE (Adverse Drug Event) and their relationships.
Annotation Scheme: The authors introduce the concept of "frames" to represent drugs and their attributes in a more structured way. A frame corresponds to a specific drug and its properties (strength, dose, route, duration). Relationships are then defined within these frames, allowing for a more precise and comprehensive representation of therapeutic regimens, especially when therapy adjustments occur over time.
Evaluation:
- The proposed architecture was trained and evaluated on the French corpus.
- The portability of the approach was assessed by evaluating it on the English n2c2 dataset.
- Performance was compared to an existing transformer-based relation classification method, focusing on information extraction accuracy (Precision, Recall, F1-score) and computational cost (training time).
- End-to-end performance was evaluated using the NLStruct library for NER and the proposed method for RE sequentially.
Implementation Details: Pre-trained transformer models fine-tuned on clinical datasets were used (ClinicalBERT and BioBERT for English, CamemBERT-BIO for French). Hyperparameters were selected through grid search cross-validation. The experiments were performed on a system equipped with an NVIDIA RTX 3070 GPU and an Intel Xeon CPU. Ecological impact was assessed using the Green Algorithms.

Results:

Relation Extraction (RE): The proposed architecture achieved competitive performance on both French and English datasets compared to the state-of-the-art, with F1-measures of 0.82 and 0.96, respectively. Crucially, the method reduced computational cost by approximately 10x compared to the baseline transformer-based approach on the French corpus and 23x on the English corpus.
End-to-End Performance: End-to-end F1 performance was 0.69 for the French corpus and 0.82 for the English corpus.
Error Analysis: Lower recall was observed due to challenges in handling modifiers applying to multiple drugs. Adding same-frame relations between drug attributes slightly improved results.

Discussion:

The paper highlights the effectiveness of transformer-based architectures for relation extraction in clinical text, even with the inherent complexities of human annotation.
The proposed method demonstrates robustness across languages (French and English) and different types of clinical data.
The frame-based representation of drugs and their attributes is shown to be beneficial, improving extraction accuracy.
The computational efficiency of the architecture makes it particularly suitable for deployment in resource-limited hospital environments.
The carbon impact of the hyperparameter optimization process is quantified, raising awareness about the environmental cost of machine learning model development.

Key Contributions:

A novel and computationally efficient architecture for relation extraction: The proposed architecture significantly reduces the computational cost of relation extraction while maintaining state-of-the-art performance.
A frame-based annotation scheme for representing medication regimens: The authors redefine how relationships between drugs and associated entities should be annotated to offer a more precise and comprehensive representation of therapeutic regimens.

Conclusion:

The proposed architecture offers a high-performance and low-cost solution for extracting medication information from clinical text, making it well-suited for hospitals with limited IT resources. The paper validates the method on both English and French clinical data, demonstrating its potential for broader application in multilingual clinical settings. The approach improves performance and addresses the critical challenges of computational complexity and data representation.

PDF Markdown

Efficient extraction of medication information from clinical notes: an evaluation in two languages (2502.03257v1)

Summary

Related Papers