Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging (2402.18913v1)

Published 29 Feb 2024 in cs.CL and cs.AI

Abstract: As an effective alternative to the direct fine-tuning on target tasks in specific languages, cross-lingual transfer addresses the challenges of limited training data by decoupling ''task ability'' and ''language ability'' by fine-tuning on the target task in the source language and another selected task in the target language, respectively. However, they fail to fully separate the task ability from the source language or the language ability from the chosen task. In this paper, we acknowledge the mutual reliance between task ability and language ability and direct our attention toward the gap between the target language and the source language on tasks. As the gap removes the impact of tasks, we assume that it remains consistent across tasks. Based on this assumption, we propose a new cross-lingual transfer method called $\texttt{AdaMergeX}$ that utilizes adaptive adapter merging. By introducing a reference task, we can determine that the divergence of adapters fine-tuned on the reference task in both languages follows the same distribution as the divergence of adapters fine-tuned on the target task in both languages. Hence, we can obtain target adapters by combining the other three adapters. Furthermore, we propose a structure-adaptive adapter merging method. Our empirical results demonstrate that our approach yields new and effective cross-lingual transfer, outperforming existing methods across all settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
  2. Composable sparse fine-tuning for cross-lingual transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1778–1796.
  3. On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  5. Towards making the most of cross-lingual transfer for zero-shot neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 142–157.
  6. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  7. Adaptersoup: Weight averaging to improve generalization of pretrained language models. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2009–2018.
  8. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451.
  9. Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. Advances in neural information processing systems, 32.
  10. Xnli: Evaluating cross-lingual sentence representations. arXiv preprint arXiv:1809.05053.
  11. Zero-shot cross-lingual transfer with learned projections using unlabeled target-language data. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 449–457.
  12. Improving zero-shot multilingual neural machine translation by leveraging cross-lingual consistency regularization. arXiv preprint arXiv:2305.07310.
  13. Exploring the relationship between alignment and cross-lingual transfer in multilingual transformers. arXiv preprint arXiv:2306.02790.
  14. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  15. Xl-sum: Large-scale multilingual abstractive summarization for 44 languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703.
  16. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations.
  17. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  18. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  19. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269.
  20. Not all languages are created equal in llms: Improving multilingual capability by cross-lingual-thought prompting. arXiv preprint arXiv:2305.07004.
  21. Editing models with task arithmetic. In The Eleventh International Conference on Learning Representations.
  22. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2.
  23. Cross-lingual transfer learning for pos tagging without cross-lingual resources. In Proceedings of the 2017 conference on empirical methods in natural language processing, pages 2832–2838.
  24. Enhancing cross-lingual natural language inference by soft prompting with multilingual verbalizer. arXiv preprint arXiv:2305.12761.
  25. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
  26. Machine-created universal language for cross-lingual transfer. arXiv preprint arXiv:2305.13071.
  27. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  28. Choosing transfer languages for cross-lingual learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, volume 57.
  29. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965.
  30. Is prompt-based finetuning always better than vanilla finetuning? insights from cross-lingual language understanding. arXiv preprint arXiv:2307.07880.
  31. Michael S Matena and Colin A Raffel. 2022. Merging models with fisher-weighted averaging. Advances in Neural Information Processing Systems, 35:17703–17716.
  32. Mass-editing memory in a transformer. In The Eleventh International Conference on Learning Representations.
  33. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  34. Memory-based model editing at scale. In International Conference on Machine Learning, pages 15817–15831. PMLR.
  35. Enhancing cross-lingual transfer via phonemic transcription integration. arXiv preprint arXiv:2307.04361.
  36. OpenAI. 2022. Chatgpt: Optimizing language models for dialogue. OpenAI Blog.
  37. OpenAI. 2023. Gpt-4 technical report.
  38. Contrastive learning for many-to-many multilingual neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 244–258.
  39. Mad-x: An adapter-based framework for multi-task cross-lingual transfer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654–7673.
  40. Xcopa: A multilingual dataset for causal commonsense reasoning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2362–2376.
  41. Combining parameter-efficient modules for task-level generalisation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 687–702.
  42. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  43. Pixel representations for multilingual translation and data-efficient cross-lingual transfer. arXiv preprint arXiv:2305.14280.
  44. Cross-lingual transfer learning for multilingual task oriented dialog. In Proceedings of NAACL-HLT, pages 3795–3805.
  45. Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations.
  46. Multilingual llms are better cross-lingual in-context learners with alignment. arXiv preprint arXiv:2305.05940.
  47. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  48. Attention is all you need. Advances in neural information processing systems, 30.
  49. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR.
  50. mt5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498.
  51. Resolving interference when merging models. arXiv preprint arXiv:2306.01708.
  52. Composing parameter-efficient modules with arithmetic operations. arXiv preprint arXiv:2306.14870.
  53. Contrastive data and learning for natural language processing. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts, pages 39–47.
  54. M3Exam: A multilingual, multimodal, multilevel benchmark for examining large language models. arXiv preprint arXiv:2306.05179.
  55. Extrapolating large language models to non-english by aligning languages. arXiv preprint arXiv:2308.04948.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yiran Zhao (26 papers)
  2. Wenxuan Zhang (75 papers)
  3. Huiming Wang (8 papers)
  4. Kenji Kawaguchi (147 papers)
  5. Lidong Bing (144 papers)
Citations (13)