Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unknown Script: Impact of Script on Cross-Lingual Transfer (2404.18810v2)

Published 29 Apr 2024 in cs.CL

Abstract: Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a LLM on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolingual and multilingual models that are pre-trained on different tokenization methods to determine factors that affect cross-lingual transfer to a new language with a unique script. Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Wondimagegnhue Tsegaye Tufa (3 papers)
  2. Ilia Markov (16 papers)
  3. Piek Vossen (23 papers)

Summary

We haven't generated a summary for this paper yet.