Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation (2109.00194v2)

Published 1 Sep 2021 in cs.CL and cs.LG

Abstract: Recent multilingual pre-trained LLMs have achieved remarkable zero-shot performance, where the model is only finetuned on one source language and directly evaluated on target languages. In this work, we propose a self-learning framework that further utilizes unlabeled data of target languages, combined with uncertainty estimation in the process to select high-quality silver labels. Three different uncertainties are adapted and analyzed specifically for the cross lingual transfer: Language Heteroscedastic/Homoscedastic Uncertainty (LEU/LOU), Evidential Uncertainty (EVI). We evaluate our framework with uncertainties on two cross-lingual tasks including Named Entity Recognition (NER) and Natural Language Inference (NLI) covering 40 languages in total, which outperforms the baselines significantly by 10 F1 on average for NER and 2.5 accuracy score for NLI.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Liyan Xu (28 papers)
  2. Xuchao Zhang (44 papers)
  3. Xujiang Zhao (26 papers)
  4. Haifeng Chen (99 papers)
  5. Feng Chen (261 papers)
  6. Jinho D. Choi (67 papers)
Citations (14)