Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression (2311.10809v1)

Published 17 Nov 2023 in cs.AI

Abstract: This study aimed to utilize text processing and NLP models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer models were used to build the NER model and evaluate its performance with the manual-labeled gold standards. The comparison of the RE methods with the gold standard showed that as the complexity increased in the RE algorithms, the F1 score increased from 0.3-0.4 to around 0.9. The NER models demonstrated excellent predictions, with the simple RE method showing 0.84-0.92 in the evaluation metrics, and the advanced and combined RE method demonstrating 0.95-0.99 in the evaluation. This study provided an example of the benefit of combining NER methods and NLP models in extracting target information from free-text to structured data and fulfilling the need for missing diagnoses from unstructured notes.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yao-Shun Chuang (5 papers)
  2. Chun-Teh Lee (5 papers)
  3. Ryan Brandon (3 papers)
  4. Trung Duong Tran (2 papers)
  5. Oluwabunmi Tokede (3 papers)
  6. Muhammad F. Walji (7 papers)
  7. Xiaoqian Jiang (59 papers)
Citations (2)