Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models (2405.11048v1)

Published 17 May 2024 in cs.HC

Abstract: An essential aspect of evaluating LLMs is identifying potential biases. This is especially relevant considering the substantial evidence that LLMs can replicate human social biases in their text outputs and further influence stakeholders, potentially amplifying harm to already marginalized individuals and communities. Therefore, recent efforts in bias detection invested in automated benchmarks and objective metrics such as accuracy (i.e., an LLMs output is compared against a predefined ground truth). Nonetheless, social biases can be nuanced, oftentimes subjective and context-dependent, where a situation is open to interpretation and there is no ground truth. While these situations can be difficult for automated evaluation systems to identify, human evaluators could potentially pick up on these nuances. In this paper, we discuss the role of human evaluation and subjective interpretation to augment automated processes when identifying biases in LLMs as part of a human-centred approach to evaluate these models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Modeling annotator perspective and polarized opinions to improve hate speech detection. In Proceedings of the AAAI conference on human computation and crowdsourcing, Vol. 8. 151–154.
  2. Is the most accurate ai the best teammate? optimizing ai for teamwork. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11405–11414.
  3. Manuela Barreto and Naomi Ellemers. 2005. The burden of benevolent sexism: How it contributes to the maintenance of gender inequalities. European journal of social psychology 35, 5 (2005), 633–642.
  4. Language (technology) is power: A critical survey of” bias” in nlp. arXiv preprint arXiv:2005.14050 (2020).
  5. On the Independence of Association Bias and Empirical Fairness in Language Models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 370–378.
  6. Toward a perspectivist turn in ground truthing for predictive computing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 6860–6868.
  7. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183–186.
  8. A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109 (2023).
  9. Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics 10 (2022), 92–110.
  10. Data Bias Management. arXiv preprint arXiv:2305.09686 (2023).
  11. Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 862–872.
  12. Accounting for Offensive Speech as a Practice of Resistance. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, and Zeerak Talat (Eds.). Association for Computational Linguistics, Seattle, Washington (Hybrid), 192–202. https://doi.org/10.18653/v1/2022.woah-1.18
  13. Human-algorithm collaboration: Achieving complementarity and avoiding unfairness. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1639–1656.
  14. ROBBIE: Robust Bias Evaluation of Large Generative Language Models. arXiv:2311.18140 [cs.CL]
  15. Holistic Evaluation of Language Models. arXiv:2211.09110 [cs.CL]
  16. Something Borrowed: Exploring the Influence of AI-Generated Explanation Text on the Composition of Human Explanations. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–7.
  17. Emilio Ferrara. 2023. Should chatgpt be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:2304.03738 (2023).
  18. Bias and fairness in large language models: A survey. arXiv preprint arXiv:2309.00770 (2023).
  19. Handling bias in toxic speech detection: A survey. Comput. Surveys 55, 13s (2023), 1–32.
  20. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462 (2020).
  21. Intrinsic Bias Metrics Do Not Correlate with Application Bias. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 1926–1940. https://doi.org/10.18653/v1/2021.acl-long.150
  22. Jury learning: Integrating dissenting voices into machine learning models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.
  23. Katherine J Hall. 1997. Subtle Sexism: Current Practice and Prospect for Change. Sage Publications.
  24. Katherine J Hall. 2016. ” They believe that because they are women, it should be easier for them.” Subtle and Overt Sexism toward Women in STEM from Social Media Commentary. Virginia Commonwealth University.
  25. Bias assessment and mitigation in llm-based code generation. arXiv preprint arXiv:2309.14345 (2023).
  26. Md Saroar Jahan and Mourad Oussalah. 2023. A systematic review of Hate Speech automatic detection using Natural Language Processing. Neurocomputing (2023), 126232.
  27. Studying the transfer of biases from programmers to programs. arXiv preprint arXiv:2005.08231 (2020).
  28. Debiasing isn’t enough!–On the Effectiveness of Debiasing MLMs and their Social Biases in Downstream Tasks. arXiv preprint arXiv:2210.02938 (2022).
  29. Hate speech criteria: A modular approach to task-specific hate speech definitions. arXiv preprint arXiv:2206.15455 (2022).
  30. Svetlana Kiritchenko and Saif M Mohammad. 2018. Examining gender and race bias in two hundred sentiment analysis systems. arXiv preprint arXiv:1805.04508 (2018).
  31. Gender bias and stereotypes in Large Language Models. In Proceedings of The ACM Collective Intelligence Conference. 12–24.
  32. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning.
  33. Towards Understanding and Mitigating Social Biases in Language Models. arXiv:2106.13219 [cs.CL]
  34. Q Vera Liao and Kush R Varshney. 2021. Human-centered explainable ai (xai): From algorithms to user experiences. arXiv preprint arXiv:2110.10790 (2021).
  35. De-biasing “bias” measurement. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 379–389.
  36. Value-based standards guide sexism inferences for self and others. Journal of Experimental Social Psychology 72 (2017), 101–117.
  37. Fighting hate speech, silencing drag queens? artificial intelligence in content moderation and risks to lgbtq voices online. Sexuality & Culture 25, 2 (2021), 700–732.
  38. BBQ: A hand-built bias benchmark for question answering. arXiv preprint arXiv:2110.08193 (2021).
  39. Katharina Reinecke and Abraham Bernstein. 2011. Improving performance, perceived usability, and aesthetics with culturally adaptive user interfaces. ACM Transactions on Computer-Human Interaction (TOCHI) 18, 2 (2011), 1–29.
  40. Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301 (2018).
  41. The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks. arXiv preprint arXiv:2210.10040 (2022).
  42. Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking. arXiv preprint arXiv:2402.05880 (2024).
  43. Everyday algorithm auditing: Understanding the power of everyday users in surfacing harmful algorithmic behaviors. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–29.
  44. The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326 (2019).
  45. Lucía Vicente and Helena Matute. 2023. Humans inherit artificial intelligence biases. Scientific Reports 13, 1 (2023), 15737.
  46. Zeerak Waseem. 2016. Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In Proceedings of the first workshop on NLP and computational social science. 138–142.
  47. Zeerak Waseem and Dirk Hovy. 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In Proceedings of the NAACL Student Research Workshop. Association for Computational Linguistics, San Diego, California, 88–93. https://doi.org/10.18653/v1/N16-2013
  48. Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 (2018).
  49. Don’t Make Your LLM an Evaluation Benchmark Cheater. arXiv preprint arXiv:2311.01964 (2023).
  50. Value-sensitive algorithm design: Method, case study, and lessons. Proceedings of the ACM on human-computer interaction 2, CSCW (2018), 1–23.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Paula Akemi Aoyagui (4 papers)
  2. Sharon Ferguson (8 papers)
  3. Anastasia Kuzminykh (13 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets