A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias (2404.00929v3)
Abstract: Based on the foundation of LLMs, Multilingual LLMs (MLLMs) have been developed to address the challenges faced in multilingual natural language processing, hoping to achieve knowledge transfer from high-resource languages to low-resource languages. However, significant limitations and challenges still exist, such as language imbalance, multilingual alignment, and inherent bias. In this paper, we aim to provide a comprehensive analysis of MLLMs, delving deeply into discussions surrounding these critical issues. First of all, we start by presenting an overview of MLLMs, covering their evolutions, key techniques, and multilingual capacities. Secondly, we explore the multilingual training corpora of MLLMs and the multilingual datasets oriented for downstream tasks that are crucial to enhance the cross-lingual capability of MLLMs. Thirdly, we survey the state-of-the-art studies of multilingual representations and investigate whether the current MLLMs can learn a universal language representation. Fourthly, we discuss bias on MLLMs, including its categories, evaluation metrics, and debiasing techniques. Finally, we discuss existing challenges and point out promising research directions of MLLMs.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, 2019, pp. 4171–4186.
- A. Conneau and G. Lample, “Cross-lingual language model pretraining,” Advances in neural information processing systems, vol. 32, 2019.
- L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mt5: A massively multilingual pre-trained text-to-text transformer,” in Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 483–498.
- T. L. Scao et al., “Bloom: A 176b-parameter open-access multilingual language model,” arXiv preprint arXiv:2211.05100, 2022.
- H. Touvron et al., “Llama: Open and efficient foundation language models,” 2023.
- A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, É. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Unsupervised cross-lingual representation learning at scale,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 8440–8451.
- S. Cao, N. Kitaev, and D. Klein, “Multilingual alignment of contextual word representations,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013.
- J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proc. 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
- E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the dangers of stochastic parrots: Can language models be too big?” in Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, 2021, pp. 610–623.
- Z. Talat et al., “You reap what you sow: On the challenges of bias evaluation under multilingual settings,” in Proc. BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models, 2022, pp. 26–41.
- B. Hutchinson, V. Prabhakaran, E. Denton, K. Webster, Y. Zhong, and S. Denuyl, “Social biases in nlp models as barriers for persons with disabilities,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 5491–5501.
- M. Nadeem, A. Bethke, and S. Reddy, “Stereoset: Measuring stereotypical bias in pretrained language models,” in Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 5356–5371.
- H. Le, L. Vial, J. Frej, V. Segonne, M. Coavoux, B. Lecouteux, A. Allauzen, B. Crabbé, L. Besacier, and D. Schwab, “Flaubert: Unsupervised language model pre-training for french,” in Proc. Twelfth Language Resources and Evaluation Conference, 2020, pp. 2479–2490.
- W. de Vries, A. van Cranenburgh, A. Bisazza, T. Caselli, G. van Noord, and M. Nissim, “Bertje: A dutch BERT model,” CoRR, vol. abs/1912.09582, 2019.
- W. Antoun, F. Baly, and H. Hajj, “Arabert: Transformer-based model for arabic language understanding,” in Proc. 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, 2020, pp. 9–15.
- A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training,” 2018.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
- T. Brown et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
- L. Ouyang et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, Nov. 2022.
- J. Achiam et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
- C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.
- M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7871–7880.
- T. Q. Nguyen and D. Chiang, “Transfer learning across low-resource, related languages for neural machine translation,” in Proc. Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2017, pp. 296–301.
- Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, and L. Zettlemoyer, “Multilingual denoising pre-training for neural machine translation,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 726–742, 2020.
- T. Pires, E. Schlinger, and D. Garrette, “How multilingual is multilingual bert?” in Proc. 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 4996–5001.
- M. Artetxe, S. Ruder, and D. Yogatama, “On the cross-lingual transferability of monolingual representations,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 4623–4637.
- A. Chowdhery et al., “Palm: Scaling language modeling with pathways,” Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023.
- R. Thoppilan et al., “Lamda: Language models for dialog applications,” 2022.
- S. Zhang et al., “Opt: Open pre-trained transformer language models,” arXiv preprint arXiv:2205.01068, 2022.
- Z. Du, Y. Qian, X. Liu, M. Ding, J. Qiu, Z. Yang, and J. Tang, “Glm: General language model pretraining with autoregressive blank infilling,” in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 320–335.
- A. Zeng et al., “Glm-130b: An open bilingual pre-trained model,” in The Eleventh International Conference on Learning Representations, 2022.
- W.-L. Chiang et al., “Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality,” March 2023. [Online]. Available: https://lmsys.org/blog/2023-03-30-vicuna/
- G. Team et al., “Gemini: a family of highly capable multimodal models,” arXiv preprint arXiv:2312.11805, 2023.
- P. Rust, J. Pfeiffer, I. Vulić, S. Ruder, and I. Gurevych, “How good is your tokenizer? on the monolingual performance of multilingual language models,” in Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, Aug. 2021, pp. 3118–3135.
- D. Zhang, Y. Yu, C. Li, J. Dong, D. Su, C. Chu, and D. Yu, “Mm-llms: Recent advances in multimodal large language models,” arXiv preprint arXiv:2401.13601, 2024.
- J. W. Rae et al., “Scaling language models: Methods, analysis & insights from training gopher,” 2022.
- W. Zeng et al., “Pangu-α𝛼\alphaitalic_α: Large-scale autoregressive pretrained chinese language models with auto-parallel computation,” arXiv preprint arXiv:2104.12369, 2021.
- H. W. Chung et al., “Scaling instruction-finetuned language models,” arXiv preprint arXiv:2210.11416, 2022.
- OpenAI. (2022). [Online]. Available: https://openai.com/blog/chatgpt
- D. Driess et al., “Palm-e: An embodied multimodal language model,” arXiv preprint arXiv:2303.03378, 2023.
- R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto, “Stanford alpaca: An instruction-following llama model,” 2023.
- X. Ren et al., “Pangu-{{\{{\\\backslash\Sigma}}\}}: Towards trillion parameter language model with sparse heterogeneous computing,” arXiv preprint arXiv:2303.10845, 2023.
- S. Biderman et al., “Pythia: A suite for analyzing large language models across training and scaling,” in International Conference on Machine Learning. PMLR, 2023, pp. 2397–2430.
- R. Anil et al., “Palm 2 technical report,” Google, Tech. Rep., 2023.
- H. Touvron et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
- GOOGLE, “An overview of bard: an early experiment with generative ai,” https://ai.google/static/documents/google-about-bard.pdf, 2023.
- BAICHUAN,, “Blog: Baichuan-7b,” https://github.com/baichuan-inc/Baichuan-7B, 2023.
- A. Yang et al., “Baichuan 2: Open large-scale language models,” arXiv preprint arXiv:2309.10305, 2023.
- MICROSOFT, “Phi-2: The surprising power of small language models,” https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/, 2023.
- ZHIPU,, “Zhipu ai devday glm-4,” https://zhipuai.cn/en/devday, 2024.
- W. X. Zhao et al., “A survey of large language models,” CoRR, vol. abs/2303.18223, 2023.
- S. Doddapaneni, G. Ramesh, A. Kunchukuttan, P. Kumar, and M. M. Khapra, “A primer on pretrained multilingual language models,” CoRR, vol. abs/2107.00676, 2021.
- X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, “Pre-trained models for natural language processing: A survey,” Science China Technological Sciences, vol. 63, no. 10, pp. 1872–1897, 2020.
- T. Shen, R. Jin, Y. Huang, C. Liu, W. Dong, Z. Guo, X. Wu, Y. Liu, and D. Xiong, “Large language model alignment: A survey,” arXiv preprint arXiv:2309.15025, 2023.
- A. Glaese et al., “Improving alignment of dialogue agents via targeted human judgements,” ArXiv, vol. abs/2209.14375, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:252596089
- Y. Bai et al., “Training a helpful and harmless assistant with reinforcement learning from human feedback,” arXiv preprint arXiv:2204.05862, 2022.
- R. Liu, G. Zhang, X. Feng, and S. Vosoughi, “Aligning generative language models with human values,” in Findings of the Association for Computational Linguistics: NAACL 2022, 2022, pp. 241–252.
- A. Baheti, X. Lu, F. Brahman, R. L. Bras, M. Sap, and M. Riedl, “Improving language models with advantage-based offline policy gradients,” arXiv preprint arXiv:2305.14718, 2023.
- D. Go, T. Korbak, G. Kruszewski, J. Rozen, N. Ryu, and M. Dymetman, “Aligning language models with preferences through f-divergence minimization,” arXiv preprint arXiv:2302.08215, 2023.
- A. Askell et al., “A general language assistant as a laboratory for alignment,” arXiv preprint arXiv:2112.00861, 2021.
- N. Lambert, L. Castricato, L. von Werra, and A. Havrilla, “Illustrating reinforcement learning from human feedback (rlhf),” Hugging Face Blog, 2022, https://huggingface.co/blog/rlhf.
- N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. F. Christiano, “Learning to summarize with human feedback,” Advances in Neural Information Processing Systems, vol. 33, pp. 3008–3021, 2020.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning, New York, New York, USA, 20–22 Jun 2016, pp. 1928–1937.
- F. R. M., “Catastrophic forgetting in connectionist networks,” Trends in cognitive sciences, vol. 3, pp. 128–135, 1999.
- M. A. Hedderich, L. Lange, H. Adel, J. Strötgen, and D. Klakow, “A survey on recent approaches for natural language processing in low-resource scenarios,” in Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 2545–2568.
- J. O. Alabi, D. I. Adelani, M. Mosbach, and D. Klakow, “Adapting pre-trained language models to african languages via multilingual adaptive fine-tuning,” in Proc. 29th International Conference on Computational Linguistics, 2022, pp. 4336–4349.
- W. Wongso, H. Lucky, and D. Suhartono, “Pre-trained transformer-based language models for sundanese,” Journal of Big Data, vol. 9, no. 1, p. 39, 2022.
- S. Torge, A. Politov, C. Lehmann, B. Saffar, and Z. Tao, “Named entity recognition for low-resource languages-profiting from language families,” in Proc. 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023), 2023, pp. 1–10.
- S. Rönnqvist, J. Kanerva, T. Salakoski, and F. Ginter, “Is multilingual bert fluent in language generation?” in Proc. First NLPL Workshop on Deep Learning for Natural Language Processing, 2019, pp. 29–36.
- Z. Wang, K. Karthikeyan, S. Mayhew, and D. Roth, “Extending multilingual bert to low-resource languages,” in Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 2649–2656.
- R. Choenni, D. Garrette, and E. Shutova, “How do languages influence each other? studying cross-lingual data sharing during llm fine-tuning,” arXiv preprint arXiv:2305.13286, 2023.
- Y. Wang, Z. Yu, J. Wang, Q. Heng, H. Chen, W. Ye, R. Xie, X. Xie, and S. Zhang, “Exploring vision-language models for imbalanced learning,” arXiv preprint arXiv:2304.01457, 2023.
- Y. Jiang, R. Qiu, Y. Zhang, and P.-F. Zhang, “Balanced and explainable social media analysis for public health with large language models,” in Australasian Database Conference, 2023, pp. 73–86.
- X. V. Lin et al., “Few-shot learning with multilingual generative language models,” in Proc. 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 9019–9052.
- L. Tian, X. Zhang, and J. H. Lau, “Rumour detection via zero-shot cross-lingual transfer learning,” in Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part I 21, 2021, pp. 603–618.
- F. Shi et al., “Language models are multilingual chain-of-thought reasoners,” in The Eleventh International Conference on Learning Representations, 2022.
- T. Ogunremi, D. Jurafsky, and C. D. Manning, “Mini but mighty: Efficient multilingual pretraining with linguistically-informed data selection,” in Findings of the Association for Computational Linguistics: EACL 2023, 2023, pp. 1221–1236.
- K. Ogueji, Y. Zhu, and J. Lin, “Small data? no problem! exploring the viability of pretrained multilingual language models for low-resourced languages,” in Proc. 1st Workshop on Multilingual Representation Learning, 2021, pp. 116–126.
- K. Ogueji, “Afriberta: Towards viable multilingual language models for low-resource languages,” Master’s thesis, University of Waterloo, 2022.
- M. Pikuliak, M. Šimko, and M. Bieliková, “Cross-lingual learning for text processing: A survey,” Expert Systems with Applications, vol. 165, pp. 113–165, 2021.
- F. Philippy, S. Guo, and S. Haddadan, “Towards a common understanding of contributing factors for cross-lingual transfer in multilingual language models: A review,” in Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, 2023, pp. 5877–5891.
- G. Penedo, Q. Malartic, D. Hesslow, R. Cojocaru, A. Cappelli, H. Alobeidli, B. Pannier, E. Almazrouei, and J. Launay, “The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only,” arXiv preprint arXiv:2306.01116, 2023.
- D. I. Adelani et al., “Masakhaner 2.0: Africa-centric transfer learning for named entity recognition,” in Proc. 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 4488–4508.
- S. Malmasi, A. Fang, B. Fetahu, S. Kar, and O. Rokhlenko, “Multiconer: A large-scale multilingual dataset for complex named entity recognition,” in Proc. 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, 2022, pp. 3798–3809.
- E. Öhman, M. Pàmies, K. Kajava, and J. Tiedemann, “XED: A multilingual dataset for sentiment analysis and emotion detection,” in The 28th International Conference on Computational Linguistics (COLING 2020), 2020.
- I. Shode, D. I. Adelani, J. Peng, and A. Feldman, “Nollysenti: Leveraging transfer learning and machine translation for nigerian movie sentiment classification,” 2023.
- S. H. Muhammad, D. I. Adelani, A. Anuoluwapo, and I. Abdulmumin, “Naijasenti: A nigerian twitter sentiment corpus for multilingual sentiment analysis,” in Proc. Thirteenth Language Resources and Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022, 2022, pp. 590–602.
- O. Ogundepo, X. Zhang, S. Sun, K. Duh, and J. Lin, “AfriCLIRMatrix: Enabling cross-lingual information retrieval for african languages,” in Proc. 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), Dec. 2022.
- S. Sun and K. Duh, “CLIRMatrix: A massively large collection of bilingual and multilingual datasets for cross-lingual information retrieval,” in Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020.
- C. Ma, A. ImaniGooghari, H. Ye, E. Asgari, and H. Schütze, “Taxi1500: A multilingual dataset for text classification in 1500 languages,” 2023.
- P. Keung, Y. Lu, G. Szarvas, and N. A. Smith, “The multilingual Amazon reviews corpus,” in Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020, pp. 4563–4568.
- G. Lample, A. Conneau, M. Ranzato, L. Denoyer, and H. Jégou, “Word translation without parallel data,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
- (2018) Wikipedia monolingual corpora. [Online]. Available: http://linguatools.org/tools/corpora/wikipedia-monolingual-corpora/
- C. Palen-Michel, J. Kim, and C. Lignos, “Multilingual open text release 1: Public domain news in 44 languages,” in Proc. Language Resources and Evaluation Conference, Marseille, France, June 2022, pp. 2080–2089.
- P. Lison and J. Tiedemann, “OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles,” in Proc. Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, May 2016, pp. 923–929.
- W. Zhu, H. Liu, Q. Dong, J. Xu, L. Kong, J. Chen, L. Li, and S. Huang, “Multilingual machine translation with large language models: Empirical results and analysis,” arXiv preprint arXiv:2304.04675, 2023.
- N. Goyal, J. Du, M. Ott, G. Anantharaman, and A. Conneau, “Larger-scale transformers for multilingual masked language modeling,” arXiv preprint arXiv:2105.00572, 2021.
- P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the association for computational linguistics, vol. 5, pp. 135–146, 2017.
- X. V. Lin et al., “Few-shot learning with multilingual generative language models,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y. Goldberg, Z. Kozareva, and Y. Zhang, Eds. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 9019–9052. [Online]. Available: https://aclanthology.org/2022.emnlp-main.616
- M. Artetxe, G. Labaka, and E. Agirre, “A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings,” in Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 789–798.
- A. Søgaard, S. Ruder, and I. Vulić, “On the limitations of unsupervised bilingual dictionary induction,” in Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 778–788.
- N. Nakashole, “Norma: Neighborhood sensitive maps for multilingual word embeddings,” in Proc. 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 512–522.
- H. Wang, J. Henderson, and P. Merlo, “Multi-adversarial learning for cross-lingual word embeddings,” in Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 463–472.
- J. Sarzynska-Wawer, A. Wawer, A. Pawlak, J. Szymanowska, I. Stefaniak, M. Jarkiewicz, and L. Okruszek, “Detecting formal thought disorder by deep contextualized word representations,” Psychiatry Research, vol. 304, p. 114135, 2021.
- T. Schuster, O. Ram, R. Barzilay, and A. Globerson, “Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing,” in Proc. NAACL-HLT, 2019, pp. 1599–1613.
- P. Gage, “A new algorithm for data compression,” C Users J., vol. 12, no. 2, p. 23–38, feb 1994.
- M. Schuster and K. Nakajima, “Japanese and korean voice search,” in 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2012, pp. 5149–5152.
- I. Vulić, E. M. Ponti, R. Litschko, G. Glavaš, and A. Korhonen, “Probing pretrained language models for lexical semantics,” in Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020, pp. 7222–7240.
- J. Zhang, B. Ji, N. Xiao, X. Duan, M. Zhang, Y. Shi, and W. Luo, “Combining static word embeddings and contextual representations for bilingual lexicon induction,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 2943–2955.
- K. Hämmerl, J. Libovickỳ, and A. Fraser, “Combining static and contextualised multilingual embeddings,” in Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 2316–2329.
- J. Zheng, Y. Wang, G. Wang, J. Xia, Y. Huang, G. Zhao, Y. Zhang, and S. Li, “Using context-to-vector with graph retrofitting to improve word embeddings,” in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 8154–8163.
- Y. Li, F. Liu, N. Collier, A. Korhonen, and I. Vulić, “Improving word translation via two-stage contrastive learning,” in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, May 2022, pp. 4353–4374.
- D. Alvarez-Melis and T. Jaakkola, “Gromov-wasserstein alignment of word embedding spaces,” in Conference on Empirical Methods in Natural Language Processing, 2018.
- S. Ren, S. Liu, M. Zhou, and S. Ma, “A graph-based coarse-to-fine method for unsupervised bilingual lexicon induction,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 3476–3485.
- T. Mohiuddin and S. Joty, “Revisiting adversarial autoencoder for unsupervised word translation with cycle consistency and improved training,” in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, Jun. 2019, pp. 3857–3867.
- T. Mohiuddin, M. S. Bari, and S. Joty, “LNMap: Departures from isomorphic assumption in bilingual lexicon induction through non-linear mapping in latent space,” in Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020, pp. 2712–2723.
- G. Glavaš and I. Vulić, “Non-linear instance-based cross-lingual mapping for non-isomorphic embedding spaces,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 7548–7555.
- K. Marchisio, N. Verma, K. Duh, and P. Koehn, “Isovec: Controlling the relative isomorphism of word embedding spaces,” in Proc. 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 6019–6033.
- J. Singh, B. McCann, R. Socher, and C. Xiong, “Bert is not an interlingua and the bias of tokenization,” in Proc. 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), 2019, pp. 47–55.
- H. Taitelbaum, G. Chechik, and J. Goldberger, “Multilingual word translation using auxiliary languages,” in Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, Nov. 2019, pp. 1330–1335.
- K. Karthikeyan, Z. Wang, S. Mayhew, and D. Roth, “Cross-lingual ability of multilingual bert: An empirical study,” in International Conference on Learning Representations, 2019.
- C.-L. Liu, T.-Y. Hsu, Y.-S. Chuang, and H.-Y. Lee, “A study of cross-lingual ability and language-specific information in multilingual bert,” arXiv preprint arXiv:2004.09205, 2020.
- J. Ahn and A. Oh, “Mitigating language-dependent ethnic bias in BERT,” in Proc. 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 2021.
- N. Meade, E. Poole-Dayan, and S. Reddy, “An empirical survey of the effectiveness of debiasing techniques for pre-trained language models,” in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, 2022, pp. 1878–1898.
- J. Zhao, S. Mukherjee, S. Hosseini, K. Chang, and A. H. Awadallah, “Gender bias in multilingual embeddings and cross-lingual transfer,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, 2020, pp. 2896–2907.
- E. Ferrara, “Should chatgpt be biased? challenges and risks of bias in large language models,” CoRR, vol. abs/2304.03738, 2023.
- S. Wu and M. Dredze, “Are all languages created equal in multilingual bert?” in Proc. 5th Workshop on Representation Learning for NLP, RepL4NLP@ACL 2020, Online, July 9, 2020, 2020, pp. 120–130.
- J. Wang, Y. Liu, and X. E. Wang, “Assessing multilingual fairness in pre-trained multimodal representations,” in Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, 2022, pp. 2681–2695.
- N. Kassner, P. Dufter, and H. Schütze, “Multilingual lama: Investigating knowledge in multilingual pretrained language models,” in Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, pp. 3250–3258.
- S. Levy, N. A. John, L. Liu, Y. Vyas, J. Ma, Y. Fujinuma, M. Ballesteros, V. Castelli, and D. Roth, “Comparing biases and the impact of multilingual training across multiple languages,” in Proc. 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, 2023, pp. 10 260–10 280.
- L. C. Piqueras and A. Søgaard, “Are pretrained multilingual models equally fair across languages?” in Proc. 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, 2022, pp. 3597–3605.
- S. Touileb, L. Øvrelid, and E. Velldal, “Occupational biases in Norwegian and multilingual language models,” in Proc. 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Seattle, Washington, Jul. 2022, pp. 200–211.
- T. Naous, M. J. Ryan, and W. Xu, “Having beer after prayer? measuring cultural bias in large language models,” CoRR, vol. abs/2305.14456, 2023.
- A. Abid, M. Farooqi, and J. Zou, “Large language models associate muslims with violence,” Nature Machine Intelligence, vol. 3, pp. 461–463, 06 2021.
- Y. T. Cao, Y. Pruksachatkun, K.-W. Chang, R. Gupta, V. Kumar, J. Dhamala, and A. Galstyan, “On the intrinsic and extrinsic fairness evaluation metrics for contextualized language representations,” in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Dublin, Ireland, May 2022, pp. 561–570.
- C. Leiter, P. Lertvittayakumjorn, M. Fomicheva, W. Zhao, Y. Gao, and S. Eger, “Towards explainable evaluation metrics for machine translation,” CoRR, vol. abs/2306.13041, 2023.
- T. Sun, J. He, X. Qiu, and X. Huang, “Bertscore is unfair: On social bias in language model-based metrics for text generation,” in Proc. 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, 2022, pp. 3726–3739.
- T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “Bertscore: Evaluating text generation with BERT,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
- T. Sellam, D. Das, and A. Parikh, “BLEURT: Learning robust metrics for text generation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 7881–7892.
- W. Yuan, G. Neubig, and P. Liu, “Bartscore: Evaluating generated text as text generation,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021, pp. 27 263–27 277.
- R. Koo, M. Lee, V. Raheja, J. I. Park, Z. M. Kim, and D. Kang, “Benchmarking cognitive biases in large language models as evaluators,” CoRR, vol. abs/2309.17012, 2023.
- P. Delobelle, E. Tokpo, T. Calders, and B. Berendt, “Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models,” in Proc. 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, United States, Jul. 2022, pp. 1693–1706.
- R. Rudinger, J. Naradowsky, B. Leonard, and B. Van Durme, “Gender bias in coreference resolution,” in Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, Louisiana, Jun. 2018.
- J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang, “Gender bias in coreference resolution: Evaluation and debiasing methods,” in Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, Louisiana, Jun. 2018.
- G. Stanovsky, N. A. Smith, and L. Zettlemoyer, “Evaluating gender bias in machine translation,” in Proc. 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, Jul. 2019.
- S. Kiritchenko and S. Mohammad, “Examining gender and race bias in two hundred sentiment analysis systems,” in Proc. Seventh Joint Conference on Lexical and Computational Semantics, New Orleans, Louisiana, Jun. 2018.
- A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in Proc. 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, Nov. 2018.
- N. Nangia, C. Vania, R. Bhalerao, and S. R. Bowman, “CrowS-pairs: A challenge dataset for measuring social biases in masked language models,” in Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020.
- M. De-Arteaga, A. Romanov, H. M. Wallach, J. T. Chayes, C. Borgs, A. Chouldechova, S. C. Geyik, K. Kenthapadi, and A. T. Kalai, “Bias in bios: A case study of semantic representation bias in a high-stakes setting,” in Proc. Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, 2019, pp. 120–128.
- A. Caliskan, J. J. Bryson, and A. Narayanan, “Semantics derived automatically from language corpora contain human-like biases,” Science, vol. 356, pp. 183–186, 2017.
- C. May, A. Wang, S. Bordia, S. R. Bowman, and R. Rudinger, “On measuring social biases in sentence encoders,” in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, Jun. 2019.
- W. Guo and A. Caliskan, “Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases,” in AIES ’21: AAAI/ACM Conference on AI, Ethics, and Society, Virtual Event, USA, May 19-21, 2021, 2021, pp. 122–133.
- A. Lauscher and G. Glavaš, “Are we consistently biased? multidimensional analysis of biases in distributional word vectors,” in Proc. Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019), Minneapolis, Minnesota, Jun. 2019, pp. 85–91.
- A. Névéol, Y. Dupont, J. Bezançon, and K. Fort, “French CrowS-pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English,” in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, May 2022, pp. 8521–8531.
- S. Bansal, V. Garimella, A. Suhane, and A. Mukherjee, “Debiasing multilingual word embeddings: A case study of three indian languages,” in HT ’21: 32nd ACM Conference on Hypertext and Social Media, Virtual Event, Ireland, 30 August 2021 - 2 September 2021, 2021, pp. 27–34.
- K. Karkkainen and J. Joo, “Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation,” in Proc. IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1548–1558.
- P. P. Liang, I. M. Li, E. Zheng, Y. C. Lim, R. Salakhutdinov, and L. Morency, “Towards debiasing sentence representations,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, 2020, pp. 5502–5515.
- S. Ravfogel, Y. Elazar, H. Gonen, M. Twiton, and Y. Goldberg, “Null it out: Guarding protected attributes by iterative nullspace projection,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 7237–7256.
- Z. Yang, Y. Yang, D. Cer, and E. Darve, “A simple and effective method to eliminate the self language bias in multilingual representations,” in Proc. 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, Nov. 2021.
- K. Webster, X. Wang, I. Tenney, A. Beutel, E. Pitler, E. Pavlick, J. Chen, and S. Petrov, “Measuring and reducing gendered correlations in pre-trained models,” CoRR, vol. abs/2010.06032, 2020.
- N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.
- F. Zhou, Y. Mao, L. Yu, Y. Yang, and T. Zhong, “Causal-debias: Unifying debiasing in pretrained language models and fine-tuning via causal invariant learning,” in Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, Jul. 2023, pp. 4227–4241.
- L. Ranaldi, E. S. Ruzzetti, D. Venditti, D. Onorati, and F. M. Zanzotto, “A trip towards fairness: Bias and de-biasing in large language models,” CoRR, vol. abs/2305.13862, 2023.
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” CoRR, vol. abs/2106.09685, 2021.
- A. Wang and O. Russakovsky, “Overwriting pretrained bias with finetuning data,” in Proc. IEEE/CVF International Conference on Computer Vision, 2023, pp. 3957–3968.
- Y. Guo, Y. Yang, and A. Abbasi, “Auto-debias: Debiasing masked language models with automated biased prompts,” in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, May 2022, pp. 1012–1023.
- J. Mattern, Z. Jin, M. Sachan, R. Mihalcea, and B. Schölkopf, “Understanding stereotypes in language models: Towards robust measurement and zero-shot debiasing,” CoRR, vol. abs/2212.10678, 2022.
- H. Dhingra, P. Jayashanker, S. Moghe, and E. Strubell, “Queer people are people first: Deconstructing sexual identity stereotypes in large language models,” CoRR, vol. abs/2307.00101, 2023.
- T. Schick, S. Udupa, and H. Schütze, “Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp,” Transactions of the Association for Computational Linguistics, p. 1408–1424, Dec 2021.
- A. Conneau, R. Rinott, G. Lample, A. Williams, S. Bowman, H. Schwenk, and V. Stoyanov, “XNLI: Evaluating cross-lingual sentence representations,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Brussels, Belgium: Association for Computational Linguistics, Oct.-Nov. 2018, pp. 2475–2485. [Online]. Available: https://aclanthology.org/D18-1269
- T. Nguyen, C. V. Nguyen, V. D. Lai, H. Man, N. T. Ngo, F. Dernoncourt, R. A. Rossi, and T. H. Nguyen, “Culturax: A cleaned, enormous, and multilingual dataset for large language models in 167 languages,” 2023.
- H. Laurençon et al., “The bigscience roots corpus: A 1.6 tb composite multilingual dataset,” Advances in Neural Information Processing Systems, vol. 35, pp. 31 809–31 826, 2022.
- J. Kreutzer et al., “Quality at a glance: An audit of web-crawled multilingual datasets,” Transactions of the Association for Computational Linguistics, vol. 10, pp. 50–72, 2022. [Online]. Available: https://aclanthology.org/2022.tacl-1.4
- I. Sen, D. Assenmacher, M. Samory, I. Augenstein, W. Aalst, and C. Wagner, “People make better edits: Measuring the efficacy of LLM-generated counterfactually augmented data for harmful language detection,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 10 480–10 504. [Online]. Available: https://aclanthology.org/2023.emnlp-main.649
- J. Zhao, T. Wang, M. Yatskar, R. Cotterell, V. Ordonez, and K.-W. Chang, “Gender bias in contextualized word embeddings,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 629–634. [Online]. Available: https://aclanthology.org/N19-1064
- L. Yang, J. Li, P. Cunningham, Y. Zhang, B. Smyth, and R. Dong, “Exploring the efficacy of automatically generated counterfactuals for sentiment analysis,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), C. Zong, F. Xia, W. Li, and R. Navigli, Eds. Online: Association for Computational Linguistics, Aug. 2021, pp. 306–316. [Online]. Available: https://aclanthology.org/2021.acl-long.26
- I. Sen, M. Samory, F. Flöck, C. Wagner, and I. Augenstein, “How does counterfactually augmented data impact models for social computing constructs?” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 325–344. [Online]. Available: https://aclanthology.org/2021.emnlp-main.28
- S. Goldfarb-Tarrant, A. Lopez, R. Blanco, and D. Marcheggiani, “Bias beyond English: Counterfactual tests for bias in sentiment analysis in four languages,” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 4458–4468. [Online]. Available: https://aclanthology.org/2023.findings-acl.272
- I. Sen, M. Samory, C. Wagner, and I. Augenstein, “Counterfactually augmented data and unintended bias: The case of sexism and hate speech detection,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, Eds. Seattle, United States: Association for Computational Linguistics, Jul. 2022, pp. 4716–4726. [Online]. Available: https://aclanthology.org/2022.naacl-main.347
- N. Joshi and H. He, “An investigation of the (in)effectiveness of counterfactually augmented data,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), S. Muresan, P. Nakov, and A. Villavicencio, Eds. Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 3668–3681. [Online]. Available: https://aclanthology.org/2022.acl-long.256
- H. Yadav and S. Sitaram, “A survey of multilingual models for automatic speech recognition,” in Proceedings of the Thirteenth Language Resources and Evaluation Conference, N. Calzolari et al., Eds. Marseille, France: European Language Resources Association, Jun. 2022, pp. 5071–5079. [Online]. Available: https://aclanthology.org/2022.lrec-1.542
- J. Hu, S. Ruder, A. Siddhant, G. Neubig, O. Firat, and M. Johnson, “Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalization,” CoRR, vol. abs/2003.11080, 2020.
- P. Dufter and H. Schütze, “Identifying elements essential for BERT’s multilinguality,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, and Y. Liu, Eds. Online: Association for Computational Linguistics, Nov. 2020, pp. 4423–4437. [Online]. Available: https://aclanthology.org/2020.emnlp-main.358
- A. Nzeyimana and A. Niyongabo Rubungo, “KinyaBERT: a morphology-aware Kinyarwanda language model,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), S. Muresan, P. Nakov, and A. Villavicencio, Eds. Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 5347–5363. [Online]. Available: https://aclanthology.org/2022.acl-long.367
- H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Barnes, and A. Mian, “A comprehensive overview of large language models,” arXiv preprint arXiv:2307.06435, 2023.
- X. Pan, B. Zhang, J. May, J. Nothman, K. Knight, and H. Ji, “Cross-lingual name tagging and linking for 282 languages,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), R. Barzilay and M.-Y. Kan, Eds. Vancouver, Canada: Association for Computational Linguistics, Jul. 2017, pp. 1946–1958. [Online]. Available: https://aclanthology.org/P17-1178
- F. Liu, E. Bugliarello, E. M. Ponti, S. Reddy, N. Collier, and D. Elliott, “Visually grounded reasoning across languages and cultures,” arXiv preprint arXiv:2109.13238, 2021.