Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analysis of Multi-Source Language Training in Cross-Lingual Transfer (2402.13562v2)

Published 21 Feb 2024 in cs.CL

Abstract: The successful adaptation of multilingual LLMs (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. xcot: Cross-lingual instruction tuning for cross-lingual chain-of-thought reasoning. arXiv preprint arXiv:2401.07037.
  2. The geometry of multilingual language model representations. arXiv preprint arXiv:2205.10964.
  3. Rochelle Choenni and Ekaterina Shutova. 2020. What does it mean to be language-agnostic? probing multilingual sentence encoders for typological properties. arXiv preprint arXiv:2009.12862.
  4. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
  5. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium. Association for Computational Linguistics.
  6. Universal dependencies. Computational linguistics, 47(2):255–308.
  7. QLoRA: Efficient finetuning of quantized LLMs. In Thirty-seventh Conference on Neural Information Processing Systems.
  8. Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In International Conference on Machine Learning, pages 4411–4421. PMLR.
  9. Turning english-centric llms into polyglots: How much multilinguality is needed? arXiv preprint arXiv:2312.12683.
  10. Bactrian-x: Multilingual replicable instruction-following models with low-rank adaptation.
  11. On the language neutrality of pre-trained multilingual representations. arXiv preprint arXiv:2004.05160.
  12. Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9019–9052, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  13. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 8–14, Valencia, Spain. Association for Computational Linguistics.
  14. Crosslingual generalization through multitask finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15991–16111, Toronto, Canada. Association for Computational Linguistics.
  15. First align, then predict: Understanding the cross-lingual ability of multilingual bert. arXiv preprint arXiv:2101.11109.
  16. Cross-lingual name tagging and linking for 282 languages. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1946–1958, Vancouver, Canada. Association for Computational Linguistics.
  17. How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996–5001, Florence, Italy. Association for Computational Linguistics.
  18. XCOPA: A multilingual dataset for causal commonsense reasoning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2362–2376, Online. Association for Computational Linguistics.
  19. Enhancing cross-lingual natural language inference by prompt-learning from cross-lingual templates. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1910–1923.
  20. Lareqa: Language-agnostic answer retrieval from a multilingual pool. arXiv preprint arXiv:2004.05484.
  21. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  22. Multilingual instruction tuning with just a pinch of multilinguality. arXiv preprint arXiv:2401.01854.
  23. Xlda: Cross-lingual data augmentation for natural language inference and question answering. arXiv preprint arXiv:1905.11471.
  24. Language-agnostic representation from multilingual sentence encoders for cross-lingual similarity estimation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7764–7774.
  25. Prompt-tuning can be much better than fine-tuning on cross-lingual understanding with multilingual language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5478–5485, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  26. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605.
  27. English contrastive learning can learn universal cross-lingual sentence embeddings. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9122–9133.
  28. Enhancing cross-lingual transfer by manifold mixup. arXiv preprint arXiv:2205.04182.
  29. PAWS-X: A cross-lingual adversarial dataset for paraphrase identification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3687–3692, Hong Kong, China. Association for Computational Linguistics.
  30. Inducing language-agnostic multilingual representations. arXiv preprint arXiv:2008.09112.
  31. Consistency regularization for cross-lingual fine-tuning. arXiv preprint arXiv:2106.08226.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Seong Hoon Lim (2 papers)
  2. Taejun Yun (2 papers)
  3. Jinhyeon Kim (5 papers)
  4. Jihun Choi (12 papers)
  5. Taeuk Kim (38 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets