Adaptive Name Entity Recognition under Highly Unbalanced Data (2003.10296v1)

Published 10 Mar 2020 in cs.CL, cs.LG, and stat.ML

Abstract: For several purposes in NLP, such as Information Extraction, Sentiment Analysis or Chatbot, Named Entity Recognition (NER) holds an important role as it helps to determine and categorize entities in text into predefined groups such as the names of persons, locations, quantities, organizations or percentages, etc. In this report, we present our experiments on a neural architecture composed of a Conditional Random Field (CRF) layer stacked on top of a Bi-directional LSTM (BI-LSTM) layer for solving NER tasks. Besides, we also employ a fusion input of embedding vectors (Glove, BERT), which are pre-trained on the huge corpus to boost the generalization capacity of the model. Unfortunately, due to the heavy unbalanced distribution cross-training data, both approaches just attained a bad performance on less training samples classes. To overcome this challenge, we introduce an add-on classification model to split sentences into two different sets: Weak and Strong classes and then designing a couple of Bi-LSTM-CRF models properly to optimize performance on each set. We evaluated our models on the test set and discovered that our method can improve performance for Weak classes significantly by using a very small data set (approximately 0.45\%) compared to the rest classes.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (3)

Thong Nguyen (38 papers)
Duy Nguyen (28 papers)
Pramod Rao (5 papers)

Citations (9)

View on Semantic Scholar

Adaptive Name Entity Recognition under Highly Unbalanced Data (2003.10296v1)

Related Papers