Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Impact of Unstated Norms in Bias Analysis of Language Models (2404.03471v3)

Published 4 Apr 2024 in cs.CL, cs.CY, and cs.LG

Abstract: Bias in LLMs has many forms, from overt discrimination to implicit stereotypes. Counterfactual bias evaluation is a widely used approach to quantifying bias and often relies on template-based probes that explicitly state group membership. It measures whether the outcome of a task, performed by an LLM, is invariant to a change of group membership. In this work, we find that template-based probes can lead to unrealistic bias measurements. For example, LLMs appear to mistakenly cast text associated with White race as negative at higher rates than other groups. We hypothesize that this arises artificially via a mismatch between commonly unstated norms, in the form of markedness, in the pretraining text of LLMs (e.g., Black president vs. president) and templates used for bias measurement (e.g., Black president vs. White president). The findings highlight the potential misleading impact of varying group membership through explicit mention in counterfactual bias quantification.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 298–306.
  2. Using natural sentence prompts for understanding biases in language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2824–2830.
  3. Is bert blind? exploring the effect of vision-and-language pretraining on visual language understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6778–6788.
  4. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta. European Language Resources Association (ELRA).
  5. Training a helpful and harmless assistant with reinforcement learning from human feedback. Preprint, arXiv:2204.05862.
  6. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
  7. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901.
  8. Quantifying social biases in NLP: A generalization and empirical comparison of extrinsic fairness metrics. Transactions of the Association for Computational Linguistics, 9:1249–1267.
  9. Quantifying social biases in nlp: A generalization and empirical comparison of extrinsic fairness metrics. Transactions of the Association for Computational Linguistics, 9:1249–1267.
  10. Bias in bios: a case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT*’19, pages 120–128, USA. Atlanta, GA.
  11. Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models. In NAACL 2022: the 2022 Conference of the North American chapter of the Association for Computational Linguistics: human language technologies, pages 1693–1706.
  12. Can instruction fine-tuned language models identify social bias through prompting? arXiv preprint arXiv:2307.10472.
  13. Cognitive bias in high-stakes decision-making with llms. arXiv preprint arXiv:2403.00811.
  14. Bias and fairness in large language models: A survey. arXiv preprint arXiv:2309.00770.
  15. The capacity for moral self-correction in large language models. Preprint, arXiv:2302.07459.
  16. Jonathan Gordon and Benjamin Van Durme. 2013. Reporting bias and knowledge acquisition. In Proceedings of the 2013 workshop on Automated knowledge base construction, pages 25–30.
  17. Herbert P Grice. 1975. Logic and conversation. In Speech acts, pages 41–58. Brill.
  18. Lovisa Hagström and Richard Johansson. 2022. What do models learn from training on more than text? measuring visual commonsense knowledge. arXiv preprint arXiv:2205.07065.
  19. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  20. Evaluating gender bias in large language models via chain-of-thought prompting. arXiv preprint arXiv:2401.15585.
  21. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR.
  22. Do ever larger octopi still amplify reporting biases? evidence from judgments of typical colour. arXiv preprint arXiv:2209.12786.
  23. Trustworthy llms: a survey and guideline for evaluating large language models’ alignment. arXiv preprint arXiv:2308.05374.
  24. Roberta: A robustly optimized bert pretraining approach. Preprint, arXiv:1907.11692.
  25. Large language models are geographically biased. arXiv preprint arXiv:2402.02680.
  26. Auditing large language models: a three-layered approach. AI and Ethics, pages 1–31.
  27. Biases in large language models: Origins, inventory, and discussion. J. Data and Information Quality, 15(2).
  28. Transferring knowledge from vision to language: How to achieve it and how to measure it? arXiv preprint arXiv:2109.11321.
  29. Pipelines for social bias testing of large language models. In Proceedings of BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models. Association for Computational Linguistics.
  30. The world of an octopus: How reporting bias influences a language model’s perception of color. arXiv preprint arXiv:2110.08182.
  31. Scaling language models: Methods, analysis & insights from training Gopher. ArXiv, abs/2112.11446.
  32. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1668–1678, Florence, Italy. Association for Computational Linguistics.
  33. Chexclusion: Fairness gaps in deep chest x-ray classifiers. In BIOCOMPUTING 2021: proceedings of the Pacific symposium, pages 232–243. World Scientific.
  34. The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326.
  35. Societal biases in language generation: Progress and challenges. arXiv preprint arXiv:2105.04054.
  36. Vered Shwartz and Yejin Choi. 2020. Do neural language models overcome reporting bias? In Proceedings of the 28th International Conference on Computational Linguistics, pages 6863–6870.
  37. Viphy: Probing" visible" physical commonsense knowledge. arXiv preprint arXiv:2209.07000.
  38. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642.
  39. Interpretable stereotype identification through reasoning. arXiv preprint arXiv:2308.00071.
  40. Soft-prompt tuning for large language models to evaluate bias. arXiv preprint arXiv:2306.04735.
  41. Llama 2: Open foundation and fine-tuned chat models. Preprint, arXiv:2307.09288.
  42. " kelly is a warm person, joseph is a role model": Gender biases in llm-generated reference letters. arXiv preprint arXiv:2310.09219.
  43. Visual commonsense in pretrained unimodal and multimodal models. arXiv preprint arXiv:2205.01850.
  44. Opt: Open pre-trained transformer language models. Preprint, arXiv:2205.01068.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Farnaz Kohankhaki (3 papers)
  2. Jacob-Junqi Tian (9 papers)
  3. Laleh Seyyed-Kalantari (10 papers)
  4. Faiza Khan Khattak (10 papers)
  5. D. B. Emerson (11 papers)
Citations (1)
Youtube Logo Streamline Icon: https://streamlinehq.com