Gender Bias in Large Language Models across Multiple Languages (2403.00277v1)
Abstract: With the growing deployment of LLMs across various applications, assessing the influence of gender biases embedded in LLMs becomes crucial. The topic of gender bias within the realm of NLP has gained considerable focus, particularly in the context of English. Nonetheless, the investigation of gender bias in languages other than English is still relatively under-explored and insufficiently analyzed. In this work, We examine gender bias in LLMs-generated outputs for different languages. We use three measurements: 1) gender bias in selecting descriptive words given the gender-related context. 2) gender bias in selecting gender-related pronouns (she/he) given the descriptive words. 3) gender bias in the topics of LLM-generated dialogues. We investigate the outputs of the GPT series of LLMs in various languages using our three measurement methods. Our findings revealed significant gender biases across all the languages we examined.
- Key female characters in film have more to talk about besides men: Automating the Bechdel test. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 830–840, Denver, Colorado. Association for Computational Linguistics.
- Chatgpt as an educational tool: Opportunities, challenges, and recommendations for communication, business writing, and composition courses. Journal of Artificial Intelligence and Technology, 3(2):60–68.
- A tale of pronouns: Interpretability informs gender bias mitigation for fairer instruction-tuned machine translation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3996–4014, Singapore. Association for Computational Linguistics.
- Building a role specified open-domain dialogue system leveraging large-scale language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2128–2150.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
- Alison Bechdel. 1986. Dykes to watch out for. Firebrand Books.
- Are models biased on text without gender-related language? In The Twelfth International Conference on Learning Representations.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
- Toxicity in chatgpt: Analyzing persona-assigned language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1236–1270, Singapore. Association for Computational Linguistics.
- Measuring and mitigating unintended bias in text classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’18, page 67–73, New York, NY, USA. Association for Computing Machinery.
- Self-collaboration code generation via chatgpt. arXiv preprint arXiv:2304.07590.
- Emily E Duehr and Joyce E Bono. 2006. Men, women, and managers: are stereotypes finally changing? Personnel psychology, 59(4):815–846.
- Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, page 214–226, New York, NY, USA. Association for Computing Machinery.
- Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, page 259–268, New York, NY, USA. Association for Computing Machinery.
- Mitigating gender bias in distilled language models via counterfactual role reversal. In Findings of the Association for Computational Linguistics: ACL 2022, pages 658–678, Dublin, Ireland. Association for Computational Linguistics.
- Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
- Uncovering implicit gender bias in narratives through commonsense inference. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3866–3873, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Social biases in NLP models as barriers for persons with disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5491–5501, Online. Association for Computational Linguistics.
- Fairness in learning: Classic and contextual bandits. Advances in neural information processing systems, 29.
- Gender bias in masked language models for multiple languages. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2740–2750, Seattle, United States. Association for Computational Linguistics.
- Chatgpt for good? on opportunities and challenges of large language models for education. Learning and individual differences, 103:102274.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2.
- Gender bias in reference letters for residency and academic medicine: a systematic review. Postgraduate medical journal, 99(1170):272–278.
- When appearance concerns make women look bad: Solo status and body image concerns diminish women’s academic performance. Journal of Experimental Social Psychology, 42(1):78–86.
- Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, CI ’23, page 12–24, New York, NY, USA. Association for Computing Machinery.
- Measuring bias in contextualized word representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 166–172, Florence, Italy. Association for Computational Linguistics.
- Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning. arXiv preprint arXiv:2304.05613.
- Analysis of gender bias in social perception and judgement using Chinese word embeddings. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 8–16, Seattle, Washington. Association for Computational Linguistics.
- Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, Virtual. Association for Computational Linguistics.
- A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6):1–35.
- StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
- CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics.
- Bias in word embeddings. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, page 446–457, New York, NY, USA. Association for Computing Machinery.
- Hi guys or hi folks? benchmarking gender-neutral machine translation with the GeNTE corpus. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14124–14140, Singapore. Association for Computational Linguistics.
- Kenneth Pomeranz. 2004. Women’s work, family, and economic development in europe and east asia: long-term trajectories and contemporary comparisons. In The Resurgence of East Asia, pages 138–186. Routledge.
- Gpt is an effective tool for multilingual psychological text analysis.
- Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301.
- Magnus Sahlgren and Fredrik Olsson. 2019. Gender bias in pretrained Swedish embeddings. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 35–43, Turku, Finland. Linköping University Electronic Press.
- The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China. Association for Computational Linguistics.
- Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations.
- Jiao Sun and Nanyun Peng. 2021. Men are elected, women are married: Events gender bias on Wikipedia. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 350–360, Online. Association for Computational Linguistics.
- Sentiment analysis through llm negotiations.
- Story centaur: Large language model few shot learning as a creative writing tool. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 244–256.
- Can existing methods debias languages other than English? first attempt to analyze and mitigate Japanese word embeddings. In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, pages 44–55, Barcelona, Spain (Online). Association for Computational Linguistics.
- Good housekeeping, great expectations: Gender and housework norms. Sociological Methods & Research, 50(3):1186–1214.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
- Occupational biases in Norwegian and multilingual language models. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 200–211, Seattle, Washington. Association for Computational Linguistics.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Frances Trix and Carolyn Psenka. 2003. Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society, 14(2):191–220.
- Unraveling downstream gender bias from large language models: A study on AI educational writing assistance. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10275–10288, Singapore. Association for Computational Linguistics.
- “kelly is a warm person, joseph is a role model”: Gender biases in LLM-generated reference letters. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3730–3748, Singapore. Association for Computational Linguistics.
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359.
- Sirui Yao and Bert Huang. 2017. Beyond parity: Fairness objectives for collaborative filtering. Advances in neural information processing systems, 30.
- Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876.
- Learning gender-neutral word embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4847–4853, Brussels, Belgium. Association for Computational Linguistics.
- Kyrie Zhixuan Zhou and Madelyn Rose Sanfilippo. 2023. Public perceptions of gender bias in large language models: Cases of chatgpt and ernie.
- Examining gender bias in languages with grammatical gender. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5276–5284, Hong Kong, China. Association for Computational Linguistics.
- Jinman Zhao (20 papers)
- Yitian Ding (5 papers)
- Chen Jia (42 papers)
- Yining Wang (91 papers)
- Zifan Qian (3 papers)