Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting (2401.15585v1)

Published 28 Jan 2024 in cs.CL

Abstract: There exist both scalable tasks, like reading comprehension and fact-checking, where model performance improves with model size, and unscalable tasks, like arithmetic reasoning and symbolic reasoning, where model performance does not necessarily improve with model size. LLMs equipped with Chain-of-Thought (CoT) prompting are able to make accurate incremental predictions even on unscalable tasks. Unfortunately, despite their exceptional reasoning abilities, LLMs tend to internalize and reproduce discriminatory societal biases. Whether CoT can provide discriminatory or egalitarian rationalizations for the implicit information in unscalable tasks remains an open question. In this study, we examine the impact of LLMs' step-by-step predictions on gender bias in unscalable tasks. For this purpose, we construct a benchmark for an unscalable task where the LLM is given a list of words comprising feminine, masculine, and gendered occupational words, and is required to count the number of feminine and masculine words. In our CoT prompts, we require the LLM to explicitly indicate whether each word in the word list is a feminine or masculine before making the final predictions. With counting and handling the meaning of words, this benchmark has characteristics of both arithmetic reasoning and symbolic reasoning. Experimental results in English show that without step-by-step prediction, most LLMs make socially biased predictions, despite the task being as simple as counting words. Interestingly, CoT prompting reduces this unconscious social bias in LLMs and encourages fair predictions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 298–306.
  2. Evaluating gender bias of pre-trained language models in natural language inference by considering all labels. arXiv preprint arXiv:2309.09697.
  3. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861.
  4. Gpt-neox-20b: An open-source autoregressive language model. arXiv preprint arXiv:2204.06745.
  5. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  7. Yang Trista Cao and Hal Daumé III. 2020. Toward gender-inclusive coreference resolution. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4568–4595, Online. Association for Computational Linguistics.
  8. On the intrinsic and extrinsic fairness evaluation metrics for contextualized language representations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 561–570, Dublin, Ireland. Association for Computational Linguistics.
  9. Llm. int8 (): 8-bit matrix multiplication for transformers at scale. arXiv preprint arXiv:2208.07339.
  10. Harms of gender exclusivity and challenges in non-binary representation in language technologies. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1968–1994, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  11. Harms of gender exclusivity and challenges in non-binary representation in language technologies. arXiv preprint arXiv:2108.12084.
  12. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325.
  13. Anders Ericsson. 2003. Valid and non-reactive verbalization of thoughts during performance of tasks towards a solution to the central problems of introspection as a source of scientific data. Journal of consciousness studies, 10(9-10):1–18.
  14. The capacity for moral self-correction in large language models. arXiv preprint arXiv:2302.07459.
  15. Intrinsic bias metrics do not correlate with application bias. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1926–1940, Online. Association for Computational Linguistics.
  16. Hila Gonen and Yoav Goldberg. 2019. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 609–614, Minneapolis, Minnesota. Association for Computational Linguistics.
  17. Auto-debias: Debiasing masked language models with automated biased prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1012–1023.
  18. Auto-debias: Debiasing masked language models with automated biased prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1012–1023, Dublin, Ireland. Association for Computational Linguistics.
  19. Przemyslaw Joniak and Akiko Aizawa. 2022. Gender biases and where to find them: Exploring gender bias in pre-trained transformer-based language models using movement pruning. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 67–73, Seattle, Washington. Association for Computational Linguistics.
  20. Masahiro Kaneko and Danushka Bollegala. 2019. Gender-preserving debiasing for pre-trained word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1641–1650, Florence, Italy. Association for Computational Linguistics.
  21. Masahiro Kaneko and Danushka Bollegala. 2021a. Debiasing pre-trained contextualised embeddings. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1256–1266, Online. Association for Computational Linguistics.
  22. Masahiro Kaneko and Danushka Bollegala. 2021b. Dictionary-based debiasing of pre-trained word embeddings. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 212–223, Online. Association for Computational Linguistics.
  23. Masahiro Kaneko and Danushka Bollegala. 2022. Unmasking the mask–evaluating social biases in masked language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36 (11), pages 11954–11962.
  24. Gender bias in meta-embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3118–3133, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  25. Comparing intrinsic gender bias evaluation measures without using human annotated examples. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2857–2863, Dubrovnik, Croatia. Association for Computational Linguistics.
  26. Gender bias in masked language models for multiple languages. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2740–2750, Seattle, United States. Association for Computational Linguistics.
  27. Solving nlp problems through human-system collaboration: A discussion-based approach. arXiv preprint arXiv:2305.11789.
  28. Masahiro Kaneko and Naoaki Okazaki. 2023. Controlled generation with prompt insertion for natural language explanations in grammatical error correction. arXiv preprint arXiv:2309.11439.
  29. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916.
  30. Measuring bias in contextualized word representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 166–172, Florence, Italy. Association for Computational Linguistics.
  31. Counterfactual fairness. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  32. Comparing biases and the impact of multilingual training across multiple languages. arXiv preprint arXiv:2305.11242.
  33. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR.
  34. Saie framework: Support alone isn’t enough–advancing llm training with adversarial remarks. arXiv preprint arXiv:2311.08107.
  35. Socially aware bias measurements for Hindi language representations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1041–1052, Seattle, United States. Association for Computational Linguistics.
  36. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  37. CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics.
  38. French CrowS-pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8521–8531, Dublin, Ireland. Association for Computational Linguistics.
  39. In-contextual bias suppression for large language models. arXiv preprint arXiv:2309.07251.
  40. OpenAI. Chatgpt: Optimizing language models for dialogue [online]. 2022.
  41. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  42. “i’m fully who i am”: Towards centering transgender and non-binary voices to measure biases in open language generation. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1246–1266.
  43. BBQ: A hand-built bias benchmark for question answering. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2086–2105, Dublin, Ireland. Association for Computational Linguistics.
  44. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
  45. Steven T Piantadosi and Felix Hill. 2022. Meaning without reference in large language models. arXiv preprint arXiv:2208.02957.
  46. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
  47. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  48. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP. Transactions of the Association for Computational Linguistics, 9:1408–1424.
  49. MosaicML NLP Team. 2023. Introducing mpt-7b: A new standard for open-source, ly usable llms.
  50. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  51. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. arXiv preprint arXiv:2305.04388.
  52. Hrishikesh Viswanath and Tianyi Zhang. 2023. Fairpy: A toolkit for evaluation of social biases and their mitigation in large language models. arXiv preprint arXiv:2302.05508.
  53. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  54. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  55. Measuring and reducing gendered correlations in pre-trained models. arXiv preprint arXiv:2010.06032.
  56. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  57. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  58. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics.
  59. Learning gender-neutral word embeddings. arXiv preprint arXiv:1809.01496.
  60. Sense embeddings are also biased – evaluating social biases in static and contextualised sense embeddings. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1924–1935, Dublin, Ireland. Association for Computational Linguistics.
Citations (22)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com