Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Group Fairness Lens for Large Language Models (2312.15478v1)

Published 24 Dec 2023 in cs.CL

Abstract: The rapid advancement of LLMs has revolutionized various applications but also raised crucial concerns about their potential to perpetuate biases and unfairness when deployed in social media contexts. Evaluating LLMs' potential biases and fairness has become crucial, as existing methods rely on limited prompts focusing on just a few groups, lacking a comprehensive categorical perspective. In this paper, we propose evaluating LLM biases from a group fairness lens using a novel hierarchical schema characterizing diverse social groups. Specifically, we construct a dataset, GFair, encapsulating target-attribute combinations across multiple dimensions. In addition, we introduce statement organization, a new open-ended text generation task, to uncover complex biases in LLMs. Extensive evaluations of popular LLMs reveal inherent safety concerns. To mitigate the biases of LLM from a group fairness perspective, we pioneer a novel chain-of-thought method GF-Think to mitigate biases of LLMs from a group fairness perspective. Experimental results demonstrate its efficacy in mitigating bias in LLMs to achieve fairness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Exposure to ideologically diverse news and opinion on Facebook. Science 348 (2015), 1130 – 1132. https://api.semanticscholar.org/CorpusID:206632821
  2. RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 1941–1955. https://doi.org/10.18653/v1/2021.acl-long.151
  3. RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:235358955
  4. Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org. http://www.fairmlbook.org.
  5. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5454–5476. https://doi.org/10.18653/v1/2020.acl-main.485
  6. Su Lin Blodgett and Brendan O’Connor. 2017. Racial disparity in natural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061 (2017).
  7. Language Models are Few-Shot Learners. ArXiv abs/2005.14165 (2020). https://api.semanticscholar.org/CorpusID:218971783
  8. Sparks of Artificial General Intelligence: Early experiments with GPT-4. ArXiv abs/2303.12712 (2023). https://api.semanticscholar.org/CorpusID:257663729
  9. On the Independence of Association Bias and Empirical Fairness in Language Models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 370–378.
  10. Semantics derived automatically from language corpora contain human-like biases. Science 356 (2016), 183 – 186. https://api.semanticscholar.org/CorpusID:23163324
  11. TweetNLP: Cutting-Edge Natural Language Processing for Social Media.
  12. On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Dublin, Ireland, 561–570. https://doi.org/10.18653/v1/2022.acl-short.62
  13. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/
  14. PaLM: Scaling Language Modeling with Pathways. ArXiv abs/2204.02311 (2022). https://api.semanticscholar.org/CorpusID:247951931
  15. Deep Reinforcement Learning from Human Preferences. ArXiv abs/1706.03741 (2017). https://api.semanticscholar.org/CorpusID:4787508
  16. Scaling Instruction-Finetuned Language Models. ArXiv abs/2210.11416 (2022). https://api.semanticscholar.org/CorpusID:253018554
  17. Kate Crawford. 2017. The Trouble with Bias. Keynote at NeurIPS.
  18. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 862–872. https://doi.org/10.1145/3442188.3445924
  19. Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models. ArXiv abs/2307.00101 (2023). https://api.semanticscholar.org/CorpusID:259316226
  20. GLM: General Language Model Pretraining with Autoregressive Blank Infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 320–335.
  21. WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models. (Jun 2023).
  22. What does ChatGPT return about human values? Exploring value bias in ChatGPT using a descriptive value theory. ArXiv abs/2304.03612 (2023). https://api.semanticscholar.org/CorpusID:258040967
  23. Filter Bubbles, Echo Chambers, and Online News Consumption. PSN: Political Communication (Topic) (2016). https://api.semanticscholar.org/CorpusID:2386849
  24. Bias and Fairness in Large Language Models: A Survey. ArXiv abs/2309.00770 (2023). https://api.semanticscholar.org/CorpusID:261530629
  25. He is very intelligent, she is very beautiful? On Mitigating Social Biases in Language Modelling and Generation. In Findings. https://api.semanticscholar.org/CorpusID:236477795
  26. Intrinsic Bias Metrics Do Not Correlate with Application Bias. ArXiv abs/2012.15859 (2020). https://api.semanticscholar.org/CorpusID:229923772
  27. Balancing out Bias: Achieving Fairness Through Balanced Training. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:247694107
  28. Balancing out Bias: Achieving Fairness Through Balanced Training. (Sep 2021).
  29. TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models. ArXiv abs/2306.11507 (2023). https://api.semanticscholar.org/CorpusID:259202452
  30. Social Biases in NLP Models as Barriers for Persons with Disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5491–5501. https://doi.org/10.18653/v1/2020.acl-main.487
  31. Instagram. 2023. Instagram Community Guidelines. https://www.facebook.com/help/instagram/477434105621119/ Accessed: 2023-10-10.
  32. Kathleen Hall Jamieson and Joseph N. Cappella. 2008. Echo Chamber: Rush Limbaugh and the Conservative Media Establishment. (2008). https://api.semanticscholar.org/CorpusID:153942056
  33. Svetlana Kiritchenko and Saif M. Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. ArXiv abs/1805.04508 (2018). https://api.semanticscholar.org/CorpusID:21670658
  34. A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces. In AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:202572693
  35. Sustainable Modular Debiasing of Language Models. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:237440429
  36. A New Generation of Perspective API: Efficient Multilingual Character-level Transformers. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022). https://api.semanticscholar.org/CorpusID:247058801
  37. A Survey on Fairness in Large Language Models. ArXiv abs/2308.10149 (2023). https://api.semanticscholar.org/CorpusID:261049466
  38. Assessing the Value of ChatGPT for Clinical Decision Support Optimization. medRxiv : the preprint server for health sciences (2023). https://api.semanticscholar.org/CorpusID:257098141
  39. TimeLMs: Diachronic Language Models from Twitter. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dublin, Ireland, 251–260. https://doi.org/10.18653/v1/2022.acl-demo.25
  40. Gender Bias in Neural Natural Language Processing. ArXiv abs/1807.11714 (2018). https://api.semanticscholar.org/CorpusID:51888520
  41. PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction. Cornell University - arXiv,Cornell University - arXiv (Oct 2020).
  42. On Measuring Social Biases in Sentence Encoders. ArXiv abs/1903.10561 (2019). https://api.semanticscholar.org/CorpusID:85518027
  43. Kris McGuffie and Alex Newhouse. 2020. The Radicalization Risks of GPT-3 and Advanced Neural Language Models. ArXiv abs/2009.06807 (2020). https://api.semanticscholar.org/CorpusID:221703020
  44. An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 1878–1898. https://doi.org/10.18653/v1/2022.acl-long.132
  45. Meta. 2023. Facebook Community Standards. https://transparency.fb.com/policies/community-standards/ Accessed: 2023-10-10.
  46. Hate speech detection and racial bias mitigation in social media based on BERT model. PloS one 15, 8 (2020), e0237861.
  47. StereoSet: Measuring stereotypical bias in pretrained language models. In Annual Meeting of the Association for Computational Linguistics.
  48. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:222090785
  49. Hadas Orgad and Yonatan Belinkov. 2022. Choose Your Lenses: Flaws in Gender Bias Evaluation. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP). Association for Computational Linguistics, Seattle, Washington, 151–167. https://doi.org/10.18653/v1/2022.gebnlp-1.17
  50. How Gender Debiasing Affects Internal Model Representations, and Why It Matters. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 2602–2628. https://doi.org/10.18653/v1/2022.naacl-main.188
  51. Training language models to follow instructions with human feedback. ArXiv abs/2203.02155 (2022). https://api.semanticscholar.org/CorpusID:246426909
  52. Eli Pariser. 2012. The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think. https://api.semanticscholar.org/CorpusID:106656450
  53. BBQ: A hand-built bias benchmark for question answering. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 2086–2105. https://doi.org/10.18653/v1/2022.findings-acl.165
  54. Perturbation Augmentation for Fairer NLP. (May 2022).
  55. Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function. ArXiv abs/1905.12801 (2019). https://api.semanticscholar.org/CorpusID:170078973
  56. Language Models are Unsupervised Multitask Learners. https://api.semanticscholar.org/CorpusID:160025533
  57. Reddit. 2023. Reddit Content Policy. https://www.redditinc.com/policies/content-policy Accessed: 2023-10-10.
  58. Gender Bias in Coreference Resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 8–14. https://doi.org/10.18653/v1/N18-2002
  59. The Risk of Racial Bias in Hate Speech Detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1668–1678. https://doi.org/10.18653/v1/P19-1163
  60. Social Bias Frames: Reasoning about Social and Power Implications of Language. In ACL.
  61. First the Worst: Finding Better Gender Translations During Beam Search. ArXiv abs/2104.07429 (2021). https://api.semanticscholar.org/CorpusID:233240748
  62. The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3407–3412. https://doi.org/10.18653/v1/D19-1339
  63. “I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:253224433
  64. Upstream Mitigation Is Not All You Need: Testing the Bias Transfer Hypothesis in Pre-Trained Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 3524–3542. https://doi.org/10.18653/v1/2022.acl-long.247
  65. Cass Robert Sunstein. 2007. Republic.com 2.0. https://api.semanticscholar.org/CorpusID:153228155
  66. SoK: Hate, Harassment, and the Changing Landscape of Online Abuse.
  67. Ewoenam Kwaku Tokpo and Toon Calders. 2022. Text Style Transfer for Bias Mitigation using Masked Language Modeling. In North American Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:246210255
  68. Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv abs/2307.09288 (2023). https://api.semanticscholar.org/CorpusID:259950998
  69. X (Twitter). 2023. Rules and policies, X Help Center. https://help.twitter.com/en/rules-and-policies Accessed: 2023-10-10.
  70. The spreading of misinformation online. Proceedings of the National Academy of Sciences 113 (2016), 554 – 559. https://api.semanticscholar.org/CorpusID:17258440
  71. BiasAsker: Measuring the Bias in Conversational AI System. ArXiv abs/2305.12434 (2023). https://api.semanticscholar.org/CorpusID:258833296
  72. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. ArXiv abs/2306.11698 (2023). https://api.semanticscholar.org/CorpusID:259202782
  73. Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns. Transactions of the Association for Computational Linguistics 6 (2018), 605–617. https://doi.org/10.1162/tacl_a_00240
  74. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
  75. Chain of Thought Prompting Elicits Reasoning in Large Language Models. ArXiv abs/2201.11903 (2022). https://api.semanticscholar.org/CorpusID:246411621
  76. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
  77. Zhongbin Xie and Thomas Lukasiewicz. 2023. An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 15730–15745. https://doi.org/10.18653/v1/2023.acl-long.876
  78. WizardLM: Empowering Large Language Models to Follow Complex Instructions. ArXiv abs/2304.12244 (2023). https://api.semanticscholar.org/CorpusID:258298159
  79. ADEPT: A DEbiasing PrompT Framework. ArXiv abs/2211.05414 (2022). https://api.semanticscholar.org/CorpusID:253446867
  80. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022).
  81. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 15–20. https://doi.org/10.18653/v1/N18-2003
  82. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv:2306.05685 [cs.CL]
  83. Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology. Cornell University - arXiv,Cornell University - arXiv (Jun 2019).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Guanqun Bi (11 papers)
  2. Lei Shen (91 papers)
  3. Yuqiang Xie (18 papers)
  4. Yanan Cao (34 papers)
  5. Tiangang Zhu (2 papers)
  6. Xiaodong He (162 papers)
Citations (3)