Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training (2109.05003v1)

Published 10 Sep 2021 in cs.CL and cs.LG

Abstract: We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base. The biggest challenge of distantly-supervised NER is that the distant supervision may induce incomplete and noisy labels, rendering the straightforward application of supervised learning ineffective. In this paper, we propose (1) a noise-robust learning scheme comprised of a new loss function and a noisy label removal step, for training NER models on distantly-labeled data, and (2) a self-training method that uses contextualized augmentations created by pre-trained LLMs to improve the generalization ability of the NER model. On three benchmark datasets, our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yu Meng (92 papers)
  2. Yunyi Zhang (39 papers)
  3. Jiaxin Huang (48 papers)
  4. Xuan Wang (205 papers)
  5. Yu Zhang (1400 papers)
  6. Heng Ji (266 papers)
  7. Jiawei Han (263 papers)
Citations (66)

Summary

We haven't generated a summary for this paper yet.