Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual Named Entity Recognition (2305.14913v1)

Published 24 May 2023 in cs.CL

Abstract: Cross-lingual named entity recognition (NER) aims to train an NER system that generalizes well to a target language by leveraging labeled data in a given source language. Previous work alleviates the data scarcity problem by translating source-language labeled data or performing knowledge distillation on target-language unlabeled data. However, these methods may suffer from label noise due to the automatic labeling process. In this paper, we propose CoLaDa, a Collaborative Label Denoising Framework, to address this problem. Specifically, we first explore a model-collaboration-based denoising scheme that enables models trained on different data sources to collaboratively denoise pseudo labels used by each other. We then present an instance-collaboration-based strategy that considers the label consistency of each token's neighborhood in the representation space for denoising. Experiments on different benchmark datasets show that the proposed CoLaDa achieves superior results compared to previous methods, especially when generalizing to distant languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Tingting Ma (6 papers)
  2. Qianhui Wu (19 papers)
  3. Huiqiang Jiang (32 papers)
  4. Börje F. Karlsson (27 papers)
  5. Tiejun Zhao (70 papers)
  6. Chin-Yew Lin (22 papers)
Citations (4)