Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks (2210.10040v2)

Published 18 Oct 2022 in cs.CL, cs.CY, cs.LG, and cs.SI

Abstract: How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given LLM? In this work, we study this question by contrasting social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. To do so, we empirically simulate various alternative constructions for a given benchmark based on innocuous modifications (such as paraphrasing or random-sampling) that maintain the essence of their social bias. On two well-known social bias benchmarks (Winogender and BiasNLI) we observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models. We hope these troubling observations motivate more robust measures of social biases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Persistent anti-muslim bias in large language models. In AAAI/ACM Conference on AI, Ethics, and Society (AIES), pages 298–306.
  2. Maria Antoniak and David Mimno. 2021. Bad seeds: Evaluating lexical methods for bias measurement. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1889–1904, Online. Association for Computational Linguistics.
  3. Your fairness may vary: Pretrained language model fairness in toxic text classification. In Annual Meeting of the Association for Computational Linguistics (ACL) - Findings.
  4. Re-contextualizing fairness in NLP: The case of India. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 727–740, Online only. Association for Computational Linguistics.
  5. Language (technology) is power: A critical survey of “bias” in nlp. In Annual Meeting of the Association for Computational Linguistics (ACL).
  6. Stereotyping norwegian salmon: an inventory of pitfalls in fairness benchmark datasets. In Annual Meeting of the Association for Computational Linguistics (ACL).
  7. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
  8. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS).
  9. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
  10. Yang Trista Cao and Hal Daumé III. 2021. Toward gender-inclusive coreference resolution: An analysis of gender and bias throughout the machine learning lifecycle. Computational Linguistics (CL).
  11. On the intrinsic and extrinsic fairness evaluation metrics for contextualized language representations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 561–570, Dublin, Ireland. Association for Computational Linguistics.
  12. Palm: Scaling language modeling with pathways. ArXiv, abs/2204.02311.
  13. Quantifying social biases in nlp: A generalization and empirical comparison of extrinsic fairness metrics. Transactions of the Association for Computational Linguistics (TACL).
  14. Bias in bios: A case study of semantic representation bias in a high-stakes setting. In ACM Conference on Fairness, Accountability and Transparency (FAccT).
  15. On measuring and mitigating biased inferences of word embeddings. Conference on Artificial Intelligence (AAAI).
  16. Oscar: Orthogonal subspace correction and rectification of biases in word embeddings. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
  17. Harms of gender exclusivity and challenges in non-binary representation in language technologies. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
  18. Intrinsic bias metrics do not correlate with application bias. In Annual Meeting of the Association for Computational Linguistics (ACL).
  19. This prompt is measuring< mask>: Evaluating bias evaluation in language models. arXiv preprint arXiv:2305.12757.
  20. SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics (TACL).
  21. UnifiedQA: Crossing Format Boundaries With a Single QA System. In Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings.
  22. Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models. Advances in Neural Information Processing Systems (NeurIPS).
  23. Look at the first sentence: Position bias in question answering. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
  24. Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations (ICLR).
  25. Higher-order coreference resolution with coarse-to-fine inference. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
  26. Collecting a large-scale gender bias dataset for coreference resolution and machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings.
  27. UnQovering Stereotypical Biases via Underspecified Questions. In Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings.
  28. WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation. arXiv preprint arXiv:2201.05955.
  29. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  30. Kenton Murray and David Chiang. 2018. Correcting length bias in neural machine translation. In Conference on Machine Translation (WMT).
  31. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  32. Crows-pairs: A challenge dataset for measuring social biases in masked language models. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
  33. A decomposable attention model for natural language inference. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
  34. Bbq: A hand-built bias benchmark for question answering. In Annual Meeting of the Association for Computational Linguistics (ACL).
  35. Few-shot instruction prompts for pretrained language models to detect social biases. arXiv preprint arXiv:2112.07868.
  36. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research (JMLR).
  37. Gender bias in coreference resolution. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
  38. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv, abs/1910.01108.
  39. Beyond leaderboards: A survey of methods for revealing weaknesses in natural language inference data and models. arXiv preprint arXiv:2005.14709.
  40. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence.
  41. Quantifying social biases using templates is unreliable. arXiv preprint arXiv:2210.04337.
  42. The Woman Worked as a Babysitter: On Biases in Language Generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
  43. Societal biases in language generation: Progress and challenges. In Conference on Empirical Methods in Natural Language Processing (EMNLP).
  44. Tejas Srinivasan and Yonatan Bisk. 2021. Worst of both worlds: Biases compound in pre-trained vision-and-language models. In Workshop on Gender Bias in Natural Language Processing.
  45. LaMDA: Language Models for Dialog Applications. arXiv preprint arXiv:2201.08239.
  46. On generalization in coreference resolution. In Proceedings of the Workshop on Computational Models of Reference, Anaphora and Coreference.
  47. T. Winograd. 1972. Understanding natural language. Cognitive psychology, 3(1):1–191.
  48. Double perturbation: On the robustness of robustness and counterfactual bias evaluation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3899–3916, Online. Association for Computational Linguistics.
  49. Ethical-advice taker: Do language models understand natural language interventions? In Annual Meeting of the Association for Computational Linguistics (ACL) - Findings.
  50. Gender bias in coreference resolution: Evaluation and debiasing methods. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nikil Roashan Selvam (5 papers)
  2. Sunipa Dev (28 papers)
  3. Daniel Khashabi (83 papers)
  4. Tushar Khot (53 papers)
  5. Kai-Wei Chang (292 papers)
Citations (24)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com