Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction (2109.06798v1)

Published 14 Sep 2021 in cs.CL

Abstract: Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of "train on English, run on any language", we find through a thorough exploration and extension of techniques that a combination of approaches, both new and old, leads to better performance than any one cross-lingual strategy in particular. We explore techniques including data projection and self-training, and how different pretrained encoders impact them. We use English-to-Arabic IE as our initial example, demonstrating strong performance in this setting for event extraction, named entity recognition, part-of-speech tagging, and dependency parsing. We then apply data projection and self-training to three tasks across eight target languages. Because no single set of techniques performs the best across all tasks, we encourage practitioners to explore various configurations of the techniques described in this work when seeking to improve on zero-shot training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Mahsa Yarmohammadi (8 papers)
  2. Shijie Wu (23 papers)
  3. Marc Marone (11 papers)
  4. Haoran Xu (77 papers)
  5. Seth Ebner (9 papers)
  6. Guanghui Qin (16 papers)
  7. Yunmo Chen (20 papers)
  8. Jialiang Guo (2 papers)
  9. Craig Harman (6 papers)
  10. Kenton Murray (37 papers)
  11. Aaron Steven White (29 papers)
  12. Mark Dredze (66 papers)
  13. Benjamin Van Durme (173 papers)
Citations (28)