No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement (2404.15737v2)
Abstract: Modular deep learning is the state-of-the-art solution for lifting the curse of multilinguality, preventing the impact of negative interference and enabling cross-lingual performance in Multilingual Pre-trained LLMs. However, a trade-off of this approach is the reduction in positive transfer learning from closely related languages. In response, we introduce a novel method called language arithmetic, which enables training-free post-processing to address this limitation. Extending the task arithmetic framework, we apply learning via addition to the language adapters, transitioning the framework from a multi-task to a multilingual setup. The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes, acting as a post-processing procedure. Language arithmetic consistently improves the baselines with significant gains, especially in the most challenging case of zero-shot application. Our code and models are available at https://github.com/mklimasz/language-arithmetic .
- IndicXNLI: Evaluating multilingual inference for Indian languages. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10994–11006, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics. 10.18653/v1/2022.emnlp-main.755.
- Tower: An open multilingual large language model for translation-related tasks, 2024.
- Composable sparse fine-tuning for cross-lingual transfer. In S. Muresan, P. Nakov, and A. Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1778–1796, Dublin, Ireland, May 2022. Association for Computational Linguistics. 10.18653/v1/2022.acl-long.125.
- Unifying cross-lingual transfer across scenarios of resource scarcity. In H. Bouamor, J. Pino, and K. Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3980–3995, Singapore, Dec. 2023. Association for Computational Linguistics. 10.18653/v1/2023.emnlp-main.242.
- On the cross-lingual transferability of monolingual representations. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637, Online, July 2020. Association for Computational Linguistics. 10.18653/v1/2020.acl-main.421.
- A. Bapna and O. Firat. Simple, scalable adaptation for neural machine translation. In K. Inui, J. Jiang, V. Ng, and X. Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1538–1548, Hong Kong, China, Nov. 2019. Association for Computational Linguistics. 10.18653/v1/D19-1165.
- Multilingual machine translation with hyper-adapters. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1170–1185, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics. 10.18653/v1/2022.emnlp-main.77.
- AdapterSoup: Weight averaging to improve generalization of pretrained language models. In A. Vlachos and I. Augenstein, editors, Findings of the Association for Computational Linguistics: EACL 2023, pages 2054–2063, Dubrovnik, Croatia, May 2023a. Association for Computational Linguistics. 10.18653/v1/2023.findings-eacl.153.
- Language-family adapters for low-resource multilingual neural machine translation. In A. K. Ojha, C.-h. Liu, E. Vylomova, F. Pirinen, J. Abbott, J. Washington, N. Oco, V. Malykh, V. Logacheva, and X. Zhao, editors, Proceedings of the The Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023), pages 59–72, Dubrovnik, Croatia, May 2023b. Association for Computational Linguistics. 10.18653/v1/2023.loresmt-1.5.
- XNLI: Evaluating cross-lingual sentence representations. In E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium, Oct.-Nov. 2018. Association for Computational Linguistics. 10.18653/v1/D18-1269.
- Unsupervised cross-lingual representation learning at scale. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online, July 2020. Association for Computational Linguistics. 10.18653/v1/2020.acl-main.747.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, and T. Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. 10.18653/v1/N19-1423.
- Don’t stop pretraining: Adapt language models to domains and tasks. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, Online, July 2020. Association for Computational Linguistics. 10.18653/v1/2020.acl-main.740.
- Hypernetworks. In International Conference on Learning Representations, 2017.
- Survey of low-resource machine translation. Computational Linguistics, 48(3):673–732, Sept. 2022. 10.1162/coli_a_00446.
- Parameter-efficient transfer learning for NLP. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2790–2799. PMLR, 09–15 Jun 2019.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
- Lorahub: Efficient cross-task generalization via dynamic lora composition, 2024.
- Editing models with task arithmetic. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
- Gated adapters for multi-domain neural machine translation. In ECAI 2023 - 26th European Conference on Artificial Intelligence, volume 372 of Frontiers in Artificial Intelligence and Applications, pages 1264–1271. IOS Press, 2023. 10.3233/FAIA230404.
- The Reality of Multi-Lingual Machine Translation, volume 21 of Studies in Computational and Theoretical Linguistics. Institute of Formal and Applied Linguistics, Prague, Czechia, 2021. ISBN 978-80-88132-11-0.
- FAD-X: Fusing adapters for cross-lingual transfer to low-resource languages. In Y. He, H. Ji, S. Li, Y. Liu, and C.-H. Chang, editors, Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 57–64, Online only, Nov. 2022. Association for Computational Linguistics.
- CamemBERT: a tasty French language model. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7203–7219, Online, July 2020. Association for Computational Linguistics. 10.18653/v1/2020.acl-main.645.
- Crosslingual generalization through multitask finetuning. In A. Rogers, J. Boyd-Graber, and N. Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15991–16111, Toronto, Canada, July 2023. Association for Computational Linguistics. 10.18653/v1/2023.acl-long.891.
- What the [mask]? making sense of language-specific bert models. ArXiv, abs/2003.02912, 2020.
- BAD-X: Bilingual adapters improve zero-shot cross-lingual transfer. In M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1791–1799, Seattle, United States, July 2022. Association for Computational Linguistics. 10.18653/v1/2022.naacl-main.130.
- Investigating the potential of task arithmetic for cross-lingual transfer. In Y. Graham and M. Purver, editors, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 124–137, St. Julian’s, Malta, Mar. 2024. Association for Computational Linguistics.
- AdapterHub: A framework for adapting transformers. In Q. Liu and D. Schlangen, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 46–54, Online, Oct. 2020a. Association for Computational Linguistics. 10.18653/v1/2020.emnlp-demos.7.
- MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In B. Webber, T. Cohn, Y. He, and Y. Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654–7673, Online, Nov. 2020b. Association for Computational Linguistics. 10.18653/v1/2020.emnlp-main.617.
- AdapterFusion: Non-destructive task composition for transfer learning. In P. Merlo, J. Tiedemann, and R. Tsarfaty, editors, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 487–503, Online, Apr. 2021. Association for Computational Linguistics. 10.18653/v1/2021.eacl-main.39.
- Lifting the curse of multilinguality by pre-training modular transformers. In M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3479–3495, Seattle, United States, July 2022. Association for Computational Linguistics. 10.18653/v1/2022.naacl-main.255.
- Modular deep learning. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. Survey Certification.
- A study of residual adapters for multi-domain neural machine translation. In L. Barrault, O. Bojar, F. Bougares, R. Chatterjee, M. R. Costa-jussà, C. Federmann, M. Fishel, A. Fraser, Y. Graham, P. Guzman, B. Haddow, M. Huck, A. J. Yepes, P. Koehn, A. Martins, M. Morishita, C. Monz, M. Nagata, T. Nakazawa, and M. Negri, editors, Proceedings of the Fifth Conference on Machine Translation, pages 617–628, Online, Nov. 2020. Association for Computational Linguistics.
- Monolingual adapters for zero-shot neural machine translation. In B. Webber, T. Cohn, Y. He, and Y. Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4465–4470, Online, Nov. 2020. Association for Computational Linguistics. 10.18653/v1/2020.emnlp-main.361.
- Massively multilingual transfer for NER. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 151–164, Florence, Italy, July 2019. Association for Computational Linguistics.
- SQuAD: 100,000+ questions for machine comprehension of text. In J. Su, K. Duh, and X. Carreras, editors, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas, Nov. 2016. Association for Computational Linguistics. 10.18653/v1/D16-1264.
- Learning multiple visual domains with residual adapters. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- COMET: A neural framework for MT evaluation. In B. Webber, T. Cohn, Y. He, and Y. Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702, Online, Nov. 2020. Association for Computational Linguistics. 10.18653/v1/2020.emnlp-main.213.
- Zipit! merging models from different tasks without training, 2023.
- UDapter: Language adaptation for truly Universal Dependency parsing. In B. Webber, T. Cohn, Y. He, and Y. Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2302–2315, Online, Nov. 2020. Association for Computational Linguistics. 10.18653/v1/2020.emnlp-main.180.
- AdapterDistillation: Non-destructive task composition with knowledge distillation. In M. Wang and I. Zitouni, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 194–201, Singapore, Dec. 2023. Association for Computational Linguistics. 10.18653/v1/2023.emnlp-industry.20.
- On negative interference in multilingual models: Findings and a meta-learning treatment. In B. Webber, T. Cohn, Y. He, and Y. Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4438–4450, Online, Nov. 2020. Association for Computational Linguistics. 10.18653/v1/2020.emnlp-main.359.
- B. Workshop. Bloom: A 176b-parameter open-access multilingual language model, 2023.
- TIES-merging: Resolving interference when merging models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Composing parameter-efficient modules with arithmetic operation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Mateusz Klimaszewski (8 papers)
- Piotr Andruszkiewicz (2 papers)
- Alexandra Birch (67 papers)