Does Debiasing Inevitably Degrade the Model Performance (2211.07350v2)
Abstract: Gender bias in LLMs has attracted sufficient attention because it threatens social justice. However, most of the current debiasing methods degraded the model's performance on other tasks while the degradation mechanism is still mysterious. We propose a theoretical framework explaining the three candidate mechanisms of the LLM's gender bias. We use our theoretical framework to explain why the current debiasing methods cause performance degradation. We also discover a pathway through which debiasing will not degrade the model performance. We further develop a causality-detection fine-tuning approach to correct gender bias. The numerical experiment demonstrates that our method is able to lead to double dividends: partially mitigating gender bias while avoiding performance degradation.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29:4349–4357, 2016.
- Learning gender-neutral word embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4847–4853, Brussels, Belgium, 2018. Association for Computational Linguistics. doi:10.18653/v1/D18-1521. URL https://aclanthology.org/D18-1521.
- Investigating gender bias in bert. Cognitive Computation, pages 1–11, 2021.
- Investigating gender bias in language models using causal mediation analysis. In NeurIPS, 2020.
- Latanya Sweeney. Discrimination in online ad delivery. Communications of the ACM, 56(5):44–54, 2013.
- Bo Cowgill. Bias and productivity in humans and algorithms: Theory and evidence from resume screening. Columbia Business School, Columbia University, 29, 2018.
- Mitigating demographic bias in ai-based resume filtering. In Adjunct Publication of the 28th ACM Conference on User Modeling, Adaptation and Personalization, pages 268–275, 2020.
- The accuracy, fairness, and limits of predicting recidivism. Science advances, 4(1):eaao5580, 2018.
- J. Dastin. Amazon scraps secret ai recruiting tool that shows bias against women. Reuters, 2018.
- RedditBias: A real-world resource for bias evaluation and debiasing of conversational language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1941–1955, Online, August 2021. Association for Computational Linguistics. doi:10.18653/v1/2021.acl-long.151. URL https://aclanthology.org/2021.acl-long.151.
- An empirical survey of the effectiveness of debiasing techniques for pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1878–1898, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.132. URL https://aclanthology.org/2022.acl-long.132.
- Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pages 220–229, 2019.
- Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy, July 2019. Association for Computational Linguistics. doi:10.18653/v1/P19-1164. URL https://aclanthology.org/P19-1164.
- Null it out: Guarding protected attributes by iterative nullspace projection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7237–7256, Online, July 2020a. Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.647. URL https://aclanthology.org/2020.acl-main.647.
- Towards debiasing sentence representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5502–5515, Online, July 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.488. URL https://aclanthology.org/2020.acl-main.488.
- Identifying and reducing gender bias in word-level language models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 7–15, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi:10.18653/v1/N19-3002. URL https://aclanthology.org/N19-3002.
- Reducing gender bias in word-level language models with a gender-equalizing loss function. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 223–228, Florence, Italy, July 2019. Association for Computational Linguistics. doi:10.18653/v1/P19-2031. URL https://aclanthology.org/P19-2031.
- A general framework for implicit and explicit debiasing of distributional word vector spaces. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8131–8138, 2020.
- Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340, 2018.
- A causal inference method for reducing gender bias in word embedding relations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9434–9441, 2020.
- Neutralizing gender bias in word embedding with latent disentanglement and counterfactual generation. arXiv preprint arXiv:2004.03133, 2020.
- Judea Pearl. Direct and indirect effects. arXiv preprint arXiv:1301.2300, 2013.
- Kawin Ethayarajh. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 55–65, Hong Kong, China, November 2019. Association for Computational Linguistics. doi:10.18653/v1/D19-1006. URL https://aclanthology.org/D19-1006.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Huggingface’s transformers: State-of-the-art natural language processing. 2019.
- Null it out: Guarding protected attributes by iterative nullspace projection. arXiv preprint arXiv:2004.07667, 2020b.
- StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online, August 2021. Association for Computational Linguistics. doi:10.18653/v1/2021.acl-long.416. URL https://aclanthology.org/2021.acl-long.416.
- On measuring social biases in sentence encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 622–628, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi:10.18653/v1/N19-1063. URL https://aclanthology.org/N19-1063.
- Pointer sentinel mixture models. 2016.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi:10.18653/v1/W18-5446. URL https://aclanthology.org/W18-5446.
- Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186, 2017.
- Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 609–614, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi:10.18653/v1/N19-1061. URL https://aclanthology.org/N19-1061.
- Yiran Liu (18 papers)
- Xiao Liu (402 papers)
- Haotian Chen (30 papers)
- Yang Yu (385 papers)