Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer (2309.10891v1)

Published 19 Sep 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Zero-shot cross-lingual transfer is a central task in multilingual NLP, allowing models trained in languages with more sufficient training resources to generalize to other low-resource languages. Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data to improve cross-lingual transferability, which are typically expensive to obtain. In this paper, we propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer of the multilingual pretrained LLMs without the help of such external data. By incorporating code-switching and embedding mixup with self-augmentation, SALT effectively distills cross-lingual knowledge from the multilingual PLM and enhances its transferability on downstream tasks. Experimental results on XNLI and PAWS-X show that our method is able to improve zero-shot cross-lingual transferability without external data. Our code is available at https://github.com/luka-group/SALT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. On difficulties of cross-lingual transfer with order differences: A case study on dependency parsing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2440–2452.
  2. Syntax-augmented multilingual BERT for cross-lingual transfer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL).
  3. GATE: graph attention transformer encoder for cross-lingual relation and event extraction. In Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI).
  4. Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, Brussels, Belgium. Association for Computational Linguistics.
  5. On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL).
  6. Mikel Artetxe and Holger Schwenk. 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 7:597–610.
  7. Multilingual alignment of contextual word representations. In 8th International Conference on Learning Representations (ICLR).
  8. Cross-lingual ability of multilingual masked language models: A study of language structure. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL).
  9. Multieurlex-a multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6974–6996.
  10. Zero-shot cross-lingual transfer of neural machine translation with multilingual pretrained encoders. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 15–26.
  11. Infoxlm: An information-theoretic framework for cross-lingual language model pre-training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
  12. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451.
  13. Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 (NeurIPS).
  14. Emerging cross-lingual structure in pretrained language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL).
  15. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
  16. Zero-shot cross-lingual abstractive sentence summarization through teaching generation and attention. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3162–3172.
  17. Philipp Dufter and Hinrich Schütze. 2020. Identifying necessary elements for bert’s multilinguality. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  18. Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL).
  19. A survey of data augmentation approaches for NLP. In Findings of the Association for Computational Linguistics (ACL-Findings).
  20. Augmenting data with mixup for sentence classification: An empirical study. arXiv preprint arXiv:1905.08941.
  21. XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In Proceedings of the 37th International Conference on Machine Learning (ICML).
  22. Improving zero-shot cross-lingual transfer learning via robust training. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1684–1697.
  23. Multilingual generative language models for zero-shot cross-lingual event argument extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL).
  24. Multilingual code-switching for zero-shot cross-lingual intent prediction and slot filling. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 211–223.
  25. Scopa: Soft code-switching and pairwise alignment for zero-shot cross-lingual transfer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 3176–3180.
  26. MLQA: evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL).
  27. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742.
  28. Attention-informed mixed-language training for zero-shot cross-lingual task-oriented dialogue systems. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI).
  29. Preserving cross-linguality of pre-trained models via continual learning. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 64–71.
  30. Xuezhe Ma and Fei Xia. 2014. Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1337–1348.
  31. Target language-aware constrained inference for cross-lingual dependency parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, (EMNLP-IJCNLP).
  32. Zero-shot cross-lingual transfer with meta learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  33. How multilingual is multilingual bert? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996–5001.
  34. Cosda-ml: multi-lingual code-switching data augmentation for zero-shot cross-lingual nlp. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 3853–3860.
  35. XTREME-R: towards more challenging and nuanced multilingual evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  36. Cross-lingual structure transfer for relation and event extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).
  37. On learning universal representations across languages. In 9th International Conference on Learning Representations (ICLR).
  38. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
  39. Sas: Self-augmented strategy for language model pre-training. arXiv preprint arXiv:2106.07176.
  40. mt5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498.
  41. Enhancing cross-lingual transfer by manifold mixup. In 10th International Conference on Learning Representations (ICLR).
  42. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Fei Wang (573 papers)
  2. Kuan-Hao Huang (33 papers)
  3. Kai-Wei Chang (292 papers)
  4. Muhao Chen (159 papers)
Citations (3)