Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks (2210.10343v2)

Published 19 Oct 2022 in cs.CL and cs.AI

Abstract: Data augmentation techniques have been used to alleviate the problem of scarce labeled data in various NER tasks (flat, nested, and discontinuous NER tasks). Existing augmentation techniques either manipulate the words in the original text that break the semantic coherence of the text, or exploit generative models that ignore preserving entities in the original text, which impedes the use of augmentation techniques on nested and discontinuous NER tasks. In this work, we propose a novel Entity-to-Text based data augmentation technique named EnTDA to add, delete, replace or swap entities in the entity list of the original texts, and adopt these augmented entity lists to generate semantically coherent and entity preserving texts for various NER tasks. Furthermore, we introduce a diversity beam search to increase the diversity during the text generation process. Experiments on thirteen NER datasets across three tasks (flat, nested, and discontinuous NER tasks) and two settings (full data and low resource settings) show that EnTDA could bring more performance improvements compared to the baseline augmentation techniques.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Xuming Hu (120 papers)
  2. Yong Jiang (194 papers)
  3. Aiwei Liu (42 papers)
  4. Zhongqiang Huang (20 papers)
  5. Pengjun Xie (85 papers)
  6. Fei Huang (408 papers)
  7. Lijie Wen (58 papers)
  8. Philip S. Yu (592 papers)
Citations (9)