Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis (2407.02030v1)
Abstract: LLMs perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis, a concept from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure their influence on the model's biases, mirroring how intergroup interactions can reduce prejudices in social contexts. We create a dataset of 108,000 prompts following a principled approach replicating social contact to measure biases in three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 social bias dimensions. We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts. Our research demonstrates that LLM responses exhibit social biases when subject to contact probing, but more importantly, these biases can be significantly reduced by up to 40% in 1 epoch of instruction tuning LLaMA 2 following our SCD strategy. Our code and data are available at https://github.com/chahatraj/breakingbias.
- The nature of prejudice.
- Yair Amichai-Hamburger and Katelyn Y. A. McKenna. 2006. The contact hypothesis reconsidered: Interacting via the internet. J. Comput. Mediat. Commun., 11:825–843.
- Measuring implicit bias in explicitly unbiased large language models.
- On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.
- A group fairness lens for large language models.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Neural Information Processing Systems.
- Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
- Wei Guo and Aylin Caliskan. 2021. Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, page 122–133, New York, NY, USA. Association for Computing Machinery.
- Evaluating gender bias in large language models via chain-of-thought prompting.
- Shelley McKeown and John Dixon. 2017. The “contact hypothesis”: Critical reflections and future directions. Social and Personality Psychology Compass, 11(1):e12295. E12295 SPCO-0762.R1.
- Bias against 93 stigmatized groups in masked language models and downstream sentiment classification tasks. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 1699–1710, New York, NY, USA. Association for Computing Machinery.
- StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
- BBQ: A hand-built bias benchmark for question answering. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2086–2105, Dublin, Ireland. Association for Computational Linguistics.
- “I’m sorry to hear that”: Finding new biases in language models with a holistic descriptor dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9180–9211, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Mitigating gender bias in natural language processing: Literature review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1630–1640, Florence, Italy. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
- “kelly is a warm person, joseph is a role model”: Gender biases in LLM-generated reference letters. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3730–3748, Singapore. Association for Computational Linguistics.
- How far can camels go? exploring the state of instruction tuning on open resources.
- The extended contact effect: Knowledge of cross-group friendships and prejudice. Journal of Personality and Social Psychology, 73:73–90.
- Mitigating unwanted biases with adversarial learning. Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society.
- Gptbias: A comprehensive framework for evaluating bias in large language models.
- Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics.
- Mind vs. mouth: On measuring re-judge inconsistency of social bias in large language models. ArXiv, abs/2308.12578.
- Chahat Raj (11 papers)
- Anjishnu Mukherjee (6 papers)
- Aylin Caliskan (38 papers)
- Antonios Anastasopoulos (111 papers)
- Ziwei Zhu (59 papers)