Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Soft Language Prompts for Language Transfer (2407.02317v2)

Published 2 Jul 2024 in cs.CL

Abstract: Cross-lingual knowledge transfer, especially between high- and low-resource languages, remains challenging in NLP. This study offers insights for improving cross-lingual NLP applications through the combination of parameter-efficient fine-tuning methods. We systematically explore strategies for enhancing cross-lingual transfer through the incorporation of language-specific and task-specific adapters and soft prompts. We present a detailed investigation of various combinations of these methods, exploring their efficiency across 16 languages, focusing on 10 mid- and low-resource languages. We further present to our knowledge the first use of soft prompts for language transfer, a technique we call soft language prompts. Our findings demonstrate that in contrast to claims of previous work, a combination of language and task adapters does not always work best; instead, combining a soft language prompt with a task adapter outperforms most configurations in many cases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. CS ANLI. https://huggingface.co/datasets/ctu-aic/anli_cs. Accessed: 2024-05-30.
  2. IndicXNLI: Evaluating multilingual inference for Indian languages. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10994–11006, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  3. MAD-G: Multilingual adapter generation for efficient cross-lingual transfer. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4762–4781, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  4. ATTEMPT: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6655–6672, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  5. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  6. Xnli: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  7. Qlora: Efficient finetuning of quantized llms. In Advances in Neural Information Processing Systems, volume 36, pages 10088–10115. Curran Associates, Inc.
  8. Slovak dataset for multilingual question answering. IEEE Access, 11:32869–32881.
  9. Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790–2799. PMLR.
  10. Parameter-efficient transfer learning for NLP. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2790–2799. PMLR.
  11. XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 4411–4421. PMLR.
  12. Is it indeed bigger better? the comprehensive study of claim detection lms applied for disinformation tackling. Preprint, arXiv:2311.06121.
  13. No train but gain: Language arithmetic for training-free language adapters enhancement. Preprint, arXiv:2404.15737.
  14. Jenny Kunz and Oskar Holmström. 2024. The impact of language adapters in cross-lingual transfer for nlu. Preprint, arXiv:2402.00149.
  15. FAD-X: Fusing adapters for cross-lingual transfer to low-resource languages. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 57–64, Online only. Association for Computational Linguistics.
  16. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  17. Mlqa: Evaluating cross-lingual extractive question answering. arXiv preprint arXiv:1910.07475, arXiv: 1910.07475.
  18. Lei Liu and Jimmy Xiangji Huang. 2023. Prompt learning to mitigate catastrophic forgetting in cross-lingual transfer for open-domain dialogue generation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 2287–2292, New York, NY, USA. Association for Computing Machinery.
  19. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. Preprint, arXiv:2308.08747.
  20. Kateřina Macková and Milan Straka. 2020. Reading comprehension in czech via machine translation and cross-lingual transfer. In Text, Speech, and Dialogue, pages 171–179, Cham. Springer International Publishing.
  21. Michael McCloskey and Neal J. Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. volume 24 of Psychology of Learning and Motivation, pages 109–165. Academic Press.
  22. Crosslingual generalization through multitask finetuning. Preprint, arXiv:2211.01786.
  23. BAD-X: Bilingual adapters improve zero-shot cross-lingual transfer. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1791–1799, Seattle, United States. Association for Computational Linguistics.
  24. Lifting the curse of multilinguality by pre-training modular transformers. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3479–3495, Seattle, United States. Association for Computational Linguistics.
  25. mmT5: Modular multilingual pre-training solves source language hallucinations. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1978–2008, Singapore. Association for Computational Linguistics.
  26. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654–7673, Online. Association for Computational Linguistics.
  27. Multilingual previously fact-checked claim retrieval. Preprint, arXiv:2305.07991.
  28. Cross-lingual learning for text processing: A survey. Expert Systems with Applications, 165:113765.
  29. Xcopa: A multilingual dataset for causal commonsense reasoning. Preprint, arXiv:2005.00333.
  30. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1).
  31. Massively multilingual transfer for NER. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 151–164, Florence, Italy. Association for Computational Linguistics.
  32. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  33. Analyzing and reducing catastrophic forgetting in parameter efficient tuning. Preprint, arXiv:2402.18865.
  34. TeQuAD:Telugu question answering dataset. In Proceedings of the 19th International Conference on Natural Language Processing (ICON), pages 300–307, New Delhi, India. Association for Computational Linguistics.
  35. Overcoming catastrophic forgetting in zero-shot cross-lingual generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9279–9300, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  36. SPoT: Better frozen model adaptation through soft prompt transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5039–5059, Dublin, Ireland. Association for Computational Linguistics.
  37. Bloom: A 176b-parameter open-access multilingual language model. Preprint, arXiv:2211.05100.
  38. Shijie Wu and Mark Dredze. 2020. Are all languages created equal in multilingual bert? Preprint, arXiv:2005.09093.
  39. Discovering low-rank subspaces for language-agnostic multilingual representations. Preprint, arXiv:2401.05792.
  40. Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment. Preprint, arXiv:2312.12148.
  41. mt5: A massively multilingual pre-trained text-to-text transformer. Preprint, arXiv:2010.11934.
  42. Adalora: Adaptive budget allocation for parameter-efficient fine-tuning. Preprint, arXiv:2303.10512.
  43. Aya model: An instruction finetuned open-access multilingual language model. Preprint, arXiv:2402.07827.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ivan Vykopal (8 papers)
  2. Simon Ostermann (26 papers)
  3. Marián Šimko (10 papers)