A Group Fairness Lens for Large Language Models (2312.15478v1)
Abstract: The rapid advancement of LLMs has revolutionized various applications but also raised crucial concerns about their potential to perpetuate biases and unfairness when deployed in social media contexts. Evaluating LLMs' potential biases and fairness has become crucial, as existing methods rely on limited prompts focusing on just a few groups, lacking a comprehensive categorical perspective. In this paper, we propose evaluating LLM biases from a group fairness lens using a novel hierarchical schema characterizing diverse social groups. Specifically, we construct a dataset, GFair, encapsulating target-attribute combinations across multiple dimensions. In addition, we introduce statement organization, a new open-ended text generation task, to uncover complex biases in LLMs. Extensive evaluations of popular LLMs reveal inherent safety concerns. To mitigate the biases of LLM from a group fairness perspective, we pioneer a novel chain-of-thought method GF-Think to mitigate biases of LLMs from a group fairness perspective. Experimental results demonstrate its efficacy in mitigating bias in LLMs to achieve fairness.
- Exposure to ideologically diverse news and opinion on Facebook. Science 348 (2015), 1130 – 1132. https://api.semanticscholar.org/CorpusID:206632821
- RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 1941–1955. https://doi.org/10.18653/v1/2021.acl-long.151
- RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:235358955
- Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org. http://www.fairmlbook.org.
- Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5454–5476. https://doi.org/10.18653/v1/2020.acl-main.485
- Su Lin Blodgett and Brendan O’Connor. 2017. Racial disparity in natural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061 (2017).
- Language Models are Few-Shot Learners. ArXiv abs/2005.14165 (2020). https://api.semanticscholar.org/CorpusID:218971783
- Sparks of Artificial General Intelligence: Early experiments with GPT-4. ArXiv abs/2303.12712 (2023). https://api.semanticscholar.org/CorpusID:257663729
- On the Independence of Association Bias and Empirical Fairness in Language Models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 370–378.
- Semantics derived automatically from language corpora contain human-like biases. Science 356 (2016), 183 – 186. https://api.semanticscholar.org/CorpusID:23163324
- TweetNLP: Cutting-Edge Natural Language Processing for Social Media.
- On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Dublin, Ireland, 561–570. https://doi.org/10.18653/v1/2022.acl-short.62
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/
- PaLM: Scaling Language Modeling with Pathways. ArXiv abs/2204.02311 (2022). https://api.semanticscholar.org/CorpusID:247951931
- Deep Reinforcement Learning from Human Preferences. ArXiv abs/1706.03741 (2017). https://api.semanticscholar.org/CorpusID:4787508
- Scaling Instruction-Finetuned Language Models. ArXiv abs/2210.11416 (2022). https://api.semanticscholar.org/CorpusID:253018554
- Kate Crawford. 2017. The Trouble with Bias. Keynote at NeurIPS.
- BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 862–872. https://doi.org/10.1145/3442188.3445924
- Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models. ArXiv abs/2307.00101 (2023). https://api.semanticscholar.org/CorpusID:259316226
- GLM: General Language Model Pretraining with Autoregressive Blank Infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 320–335.
- WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models. (Jun 2023).
- What does ChatGPT return about human values? Exploring value bias in ChatGPT using a descriptive value theory. ArXiv abs/2304.03612 (2023). https://api.semanticscholar.org/CorpusID:258040967
- Filter Bubbles, Echo Chambers, and Online News Consumption. PSN: Political Communication (Topic) (2016). https://api.semanticscholar.org/CorpusID:2386849
- Bias and Fairness in Large Language Models: A Survey. ArXiv abs/2309.00770 (2023). https://api.semanticscholar.org/CorpusID:261530629
- He is very intelligent, she is very beautiful? On Mitigating Social Biases in Language Modelling and Generation. In Findings. https://api.semanticscholar.org/CorpusID:236477795
- Intrinsic Bias Metrics Do Not Correlate with Application Bias. ArXiv abs/2012.15859 (2020). https://api.semanticscholar.org/CorpusID:229923772
- Balancing out Bias: Achieving Fairness Through Balanced Training. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:247694107
- Balancing out Bias: Achieving Fairness Through Balanced Training. (Sep 2021).
- TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models. ArXiv abs/2306.11507 (2023). https://api.semanticscholar.org/CorpusID:259202452
- Social Biases in NLP Models as Barriers for Persons with Disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5491–5501. https://doi.org/10.18653/v1/2020.acl-main.487
- Instagram. 2023. Instagram Community Guidelines. https://www.facebook.com/help/instagram/477434105621119/ Accessed: 2023-10-10.
- Kathleen Hall Jamieson and Joseph N. Cappella. 2008. Echo Chamber: Rush Limbaugh and the Conservative Media Establishment. (2008). https://api.semanticscholar.org/CorpusID:153942056
- Svetlana Kiritchenko and Saif M. Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. ArXiv abs/1805.04508 (2018). https://api.semanticscholar.org/CorpusID:21670658
- A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces. In AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:202572693
- Sustainable Modular Debiasing of Language Models. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:237440429
- A New Generation of Perspective API: Efficient Multilingual Character-level Transformers. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022). https://api.semanticscholar.org/CorpusID:247058801
- A Survey on Fairness in Large Language Models. ArXiv abs/2308.10149 (2023). https://api.semanticscholar.org/CorpusID:261049466
- Assessing the Value of ChatGPT for Clinical Decision Support Optimization. medRxiv : the preprint server for health sciences (2023). https://api.semanticscholar.org/CorpusID:257098141
- TimeLMs: Diachronic Language Models from Twitter. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dublin, Ireland, 251–260. https://doi.org/10.18653/v1/2022.acl-demo.25
- Gender Bias in Neural Natural Language Processing. ArXiv abs/1807.11714 (2018). https://api.semanticscholar.org/CorpusID:51888520
- PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction. Cornell University - arXiv,Cornell University - arXiv (Oct 2020).
- On Measuring Social Biases in Sentence Encoders. ArXiv abs/1903.10561 (2019). https://api.semanticscholar.org/CorpusID:85518027
- Kris McGuffie and Alex Newhouse. 2020. The Radicalization Risks of GPT-3 and Advanced Neural Language Models. ArXiv abs/2009.06807 (2020). https://api.semanticscholar.org/CorpusID:221703020
- An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 1878–1898. https://doi.org/10.18653/v1/2022.acl-long.132
- Meta. 2023. Facebook Community Standards. https://transparency.fb.com/policies/community-standards/ Accessed: 2023-10-10.
- Hate speech detection and racial bias mitigation in social media based on BERT model. PloS one 15, 8 (2020), e0237861.
- StereoSet: Measuring stereotypical bias in pretrained language models. In Annual Meeting of the Association for Computational Linguistics.
- CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:222090785
- Hadas Orgad and Yonatan Belinkov. 2022. Choose Your Lenses: Flaws in Gender Bias Evaluation. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP). Association for Computational Linguistics, Seattle, Washington, 151–167. https://doi.org/10.18653/v1/2022.gebnlp-1.17
- How Gender Debiasing Affects Internal Model Representations, and Why It Matters. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 2602–2628. https://doi.org/10.18653/v1/2022.naacl-main.188
- Training language models to follow instructions with human feedback. ArXiv abs/2203.02155 (2022). https://api.semanticscholar.org/CorpusID:246426909
- Eli Pariser. 2012. The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think. https://api.semanticscholar.org/CorpusID:106656450
- BBQ: A hand-built bias benchmark for question answering. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 2086–2105. https://doi.org/10.18653/v1/2022.findings-acl.165
- Perturbation Augmentation for Fairer NLP. (May 2022).
- Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function. ArXiv abs/1905.12801 (2019). https://api.semanticscholar.org/CorpusID:170078973
- Language Models are Unsupervised Multitask Learners. https://api.semanticscholar.org/CorpusID:160025533
- Reddit. 2023. Reddit Content Policy. https://www.redditinc.com/policies/content-policy Accessed: 2023-10-10.
- Gender Bias in Coreference Resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 8–14. https://doi.org/10.18653/v1/N18-2002
- The Risk of Racial Bias in Hate Speech Detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1668–1678. https://doi.org/10.18653/v1/P19-1163
- Social Bias Frames: Reasoning about Social and Power Implications of Language. In ACL.
- First the Worst: Finding Better Gender Translations During Beam Search. ArXiv abs/2104.07429 (2021). https://api.semanticscholar.org/CorpusID:233240748
- The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3407–3412. https://doi.org/10.18653/v1/D19-1339
- “I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:253224433
- Upstream Mitigation Is Not All You Need: Testing the Bias Transfer Hypothesis in Pre-Trained Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 3524–3542. https://doi.org/10.18653/v1/2022.acl-long.247
- Cass Robert Sunstein. 2007. Republic.com 2.0. https://api.semanticscholar.org/CorpusID:153228155
- SoK: Hate, Harassment, and the Changing Landscape of Online Abuse.
- Ewoenam Kwaku Tokpo and Toon Calders. 2022. Text Style Transfer for Bias Mitigation using Masked Language Modeling. In North American Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:246210255
- Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv abs/2307.09288 (2023). https://api.semanticscholar.org/CorpusID:259950998
- X (Twitter). 2023. Rules and policies, X Help Center. https://help.twitter.com/en/rules-and-policies Accessed: 2023-10-10.
- The spreading of misinformation online. Proceedings of the National Academy of Sciences 113 (2016), 554 – 559. https://api.semanticscholar.org/CorpusID:17258440
- BiasAsker: Measuring the Bias in Conversational AI System. ArXiv abs/2305.12434 (2023). https://api.semanticscholar.org/CorpusID:258833296
- DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. ArXiv abs/2306.11698 (2023). https://api.semanticscholar.org/CorpusID:259202782
- Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns. Transactions of the Association for Computational Linguistics 6 (2018), 605–617. https://doi.org/10.1162/tacl_a_00240
- Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
- Chain of Thought Prompting Elicits Reasoning in Large Language Models. ArXiv abs/2201.11903 (2022). https://api.semanticscholar.org/CorpusID:246411621
- Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
- Zhongbin Xie and Thomas Lukasiewicz. 2023. An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 15730–15745. https://doi.org/10.18653/v1/2023.acl-long.876
- WizardLM: Empowering Large Language Models to Follow Complex Instructions. ArXiv abs/2304.12244 (2023). https://api.semanticscholar.org/CorpusID:258298159
- ADEPT: A DEbiasing PrompT Framework. ArXiv abs/2211.05414 (2022). https://api.semanticscholar.org/CorpusID:253446867
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022).
- Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 15–20. https://doi.org/10.18653/v1/N18-2003
- Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv:2306.05685 [cs.CL]
- Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology. Cornell University - arXiv,Cornell University - arXiv (Jun 2019).
- Guanqun Bi (11 papers)
- Lei Shen (91 papers)
- Yuqiang Xie (18 papers)
- Yanan Cao (34 papers)
- Tiangang Zhu (2 papers)
- Xiaodong He (162 papers)