Extracting UMLS Concepts from Medical Text Using General and Domain-Specific Deep Learning Models (1910.01274v1)
Abstract: Entity recognition is a critical first step to a number of clinical NLP applications, such as entity linking and relation extraction. We present the first attempt to apply state-of-the-art entity recognition approaches on a newly released dataset, MedMentions. This dataset contains over 4000 biomedical abstracts, annotated for UMLS semantic types. In comparison to existing datasets, MedMentions contains a far greater number of entity types, and thus represents a more challenging but realistic scenario in a real-world setting. We explore a number of relevant dimensions, including the use of contextual versus non-contextual word embeddings, general versus domain-specific unsupervised pre-training, and different deep learning architectures. We contrast our results against the well-known i2b2 2010 entity recognition dataset, and propose a new method to combine general and domain-specific information. While producing a state-of-the-art result for the i2b2 2010 task (F1 = 0.90), our results on MedMentions are significantly lower (F1 = 0.63), suggesting there is still plenty of opportunity for improvement on this new data.
- Kathleen C. Fraser (22 papers)
- Isar Nejadgholi (27 papers)
- Berry De Bruijn (4 papers)
- Muqun Li (1 paper)
- Astha LaPlante (1 paper)
- Khaldoun Zine El Abidine (2 papers)