Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Simple Questions Generate Named Entity Recognition Datasets (2112.08808v4)

Published 16 Dec 2021 in cs.CL

Abstract: Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities. This research introduces an ask-to-generate approach that automatically generates NER datasets by asking questions in simple natural language to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer in-domain resources, our models, solely trained on the generated datasets, largely outperform strong low-resource models by an average F1 score of 19.4 for six popular NER benchmarks. Furthermore, our models provide competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by an F1 score of 5.2 on three benchmarks and achieve new state-of-the-art performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hyunjae Kim (25 papers)
  2. Jaehyo Yoo (5 papers)
  3. Seunghyun Yoon (64 papers)
  4. Jinhyuk Lee (27 papers)
  5. Jaewoo Kang (83 papers)
Citations (7)