Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging (2210.09840v2)

Published 18 Oct 2022 in cs.CL

Abstract: Part-of-Speech (POS) tagging is an important component of the NLP pipeline, but many low-resource languages lack labeled data for training. An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages. In this paper, we propose a novel method for transferring labels from multiple high-resource source to low-resource target languages. We formalize POS tag projection as graph-based label propagation. Given translations of a sentence in multiple languages, we create a graph with words as nodes and alignment links as edges by aligning words for all language pairs. We then propagate node labels from source to target using a Graph Neural Network augmented with transformer layers. We show that our propagation creates training sets that allow us to train POS taggers for a diverse set of languages. When combined with enhanced contextualized embeddings, our method achieves a new state-of-the-art for unsupervised POS tagging of low-resource languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ayyoob Imani (16 papers)
  2. Silvia Severini (7 papers)
  3. Masoud Jalili Sabet (12 papers)
  4. François Yvon (49 papers)
  5. Hinrich Schütze (250 papers)
Citations (4)