Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

On The Role of Reasoning in the Identification of Subtle Stereotypes in Natural Language (2308.00071v3)

Published 24 Jul 2023 in cs.CL, cs.AI, cs.CY, and cs.LG

Abstract: LLMs are trained on vast, uncurated datasets that contain various forms of biases and language reinforcing harmful stereotypes that may be subsequently inherited by the models themselves. Therefore, it is essential to examine and address biases in LLMs, integrating fairness into their development to ensure that these models do not perpetuate social biases. In this work, we demonstrate the importance of reasoning in zero-shot stereotype identification across several open-source LLMs. Accurate identification of stereotypical language is a complex task requiring a nuanced understanding of social structures, biases, and existing unfair generalizations about particular groups. While improved accuracy is observed through model scaling, the use of reasoning, especially multi-step reasoning, is crucial to consistent performance. Additionally, through a qualitative analysis of select reasoning traces, we highlight how reasoning improves not just accuracy, but also the interpretability of model decisions. This work firmly establishes reasoning as a critical component in automatic stereotype detection and is a first step towards stronger stereotype mitigation pipelines for LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Large language models associate muslims with violence. Nature Machine Intelligence, 3(6):461–463.
  2. Constitutional AI: Harmlessness from AI feedback.
  3. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623.
  4. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  5. Vicuna: An open-source chatbot impressing GPT-4 with 90%* ChatGPT quality.
  6. Quantifying social biases in NLP: A generalization and empirical comparison of extrinsic fairness metrics. Transactions of the Association for Computational Linguistics, 9:1249–1267.
  7. Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models. In NAACL 2022: the 2022 Conference of the North American chapter of the Association for Computational Linguistics: human language technologies, pages 1693–1706.
  8. Can instruction fine-tuned language models identify social bias through prompting?
  9. SafetyKit: First aid for measuring safety in open-domain conversational systems. In Proceedings of the 60th Annual Meeting of the ACL (Volume 1: Long Papers), pages 4113–4133, Dublin, Ireland. ACL.
  10. “So what if ChatGPT wrote it?” multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71:102642.
  11. The capacity for moral self-correction in large language models.
  12. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.
  13. Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards reasoning in large language models: A survey. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1049–1065, Toronto, Canada. Association for Computational Linguistics.
  14. Maieutic prompting: Logically consistent reasoning with recursive explanations.
  15. ChatGPT for good? on opportunities and challenges of large language models for education. Learning and Individual Differences, 103:102274.
  16. Large language models are zero-shot reasoners.
  17. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles.
  18. Rlaif: Scaling reinforcement learning from human feedback with ai feedback.
  19. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR.
  20. Holistic evaluation of language models.
  21. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland. Association for Computational Linguistics.
  22. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6):1–35.
  23. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  24. Auditing large language models: a three-layered approach. AI and Ethics, pages 1–31.
  25. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  26. BBQ: A hand-built bias benchmark for question answering. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2086–2105. Association for Computational Linguistics.
  27. Recipes for building an open-domain chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 300–325. Association for Computational Linguistics.
  28. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.
  29. Soft-prompt tuning for large language models to evaluate bias.
  30. LLaMA: Open and efficient foundation language models.
  31. Llama 2: Open foundation and fine-tuned chat models.
  32. Chain-of-thought prompting elicits reasoning in large language models.
  33. Ethical and social risks of harm from language models.
  34. SCoRe: Pre-training for context representation in conversational semantic parsing. In ICLR.
  35. OPT: Open pre-trained transformer language models.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com