BLIND: Bias Removal With No Demographics (2212.10563v2)
Abstract: Models trained on real-world data tend to imitate and amplify social biases. Common methods to mitigate biases require prior information on the types of biases that should be mitigated (e.g., gender or racial bias) and the social groups associated with each data sample. In this work, we introduce BLIND, a method for bias removal with no prior knowledge of the demographics in the dataset. While training a model on a downstream task, BLIND detects biased samples using an auxiliary model that predicts the main model's success, and down-weights those samples during the training process. Experiments with racial and gender biases in sentiment classification and occupation classification tasks demonstrate that BLIND mitigates social biases without relying on a costly demographic annotation process. Our method is competitive with other methods that require demographic information and sometimes even surpasses them.
- Yonatan Belinkov. 2021. Probing classifiers: Promises, shortcomings, and alternatives. Computational Linguistics 2021.
- Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
- Demographic dialectal variation in social media: A case study of African-American English. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1119–1130, Austin, Texas. Association for Computational Linguistics.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.
- Fair transfer learning with missing protected attributes. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’19, page 91–98, New York, NY, USA. Association for Computing Machinery.
- Bias in bios: A case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 120–128, New York, NY, USA. Association for Computing Machinery.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Yanai Elazar and Yoav Goldberg. 2018. Adversarial removal of demographic attributes from text data. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 11–21, Brussels, Belgium. Association for Computational Linguistics.
- Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030.
- Competency problems: On finding and removing artifacts in language data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1801–1813, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673.
- Gabriel Grand and Yonatan Belinkov. 2019. Adversarial regularization for visual question answering: Strengths, shortcomings, and side effects. In Proceedings of the Second Workshop on Shortcomings in Vision and Language, pages 1–13, Minneapolis, Minnesota. Association for Computational Linguistics.
- Balancing out bias: Achieving fairness through balanced training. arXiv.
- Decoupling adversarial training for fair NLP. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 471–477, Online. Association for Computational Linguistics.
- Diverse adversaries for mitigating bias in training. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2760–2765, Online. Association for Computational Linguistics.
- Fairness without demographics in repeated loss minimization. In International Conference on Machine Learning, pages 1929–1938. PMLR.
- Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
- John Hewitt and Percy Liang. 2019. Designing and interpreting probes with control tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2733–2743, Hong Kong, China. Association for Computational Linguistics.
- Decoupling adversarial training for fair NLP. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics.
- End-to-end bias mitigation by modelling biases in corpora. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8706–8716, Online. Association for Computational Linguistics.
- Prompt-free and efficient few-shot learning with language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3638–3652, Dublin, Ireland. Association for Computational Linguistics.
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR (Poster).
- Fairness without demographics through adversarially reweighted learning. In Advances in Neural Information Processing Systems, volume 33, pages 728–740. Curran Associates, Inc.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988.
- Just train twice: Improving group robustness without training group information. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 6781–6792. PMLR.
- Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In International Conference on Learning Representations.
- It’s all in the name: Mitigating gender bias with name-based counterfactual data substitution. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5267–5275, Hong Kong, China. Association for Computational Linguistics.
- Michael Mendelson and Yonatan Belinkov. 2021. Debiasing methods in natural language understanding make bias more accessible. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1545–1557, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Hadas Orgad and Yonatan Belinkov. 2022. Choose your lenses: Flaws in gender bias evaluation. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 151–167, Seattle, Washington. Association for Computational Linguistics.
- How gender debiasing affects internal model representations, and why it matters. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2602–2628, Seattle, United States. Association for Computational Linguistics.
- Using focal loss to fight shallow heuristics: An empirical analysis of modulated cross-entropy in natural language inference. arXiv preprint arXiv:2211.13331.
- Null it out: Guarding protected attributes by iterative nullspace projection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7237–7256, Online. Association for Computational Linguistics.
- Linear adversarial concept erasure. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 18400–18421. PMLR.
- Fairbatch: Batch selection for model fairness. In 9th International Conference on Learning Representations. The International Conference on Learning Representations.
- Learning from others’ mistakes: Avoiding dataset biases without modeling them. In International Conference on Learning Representations.
- Optimising equal opportunity fairness in model training. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4073–4084, Seattle, United States. Association for Computational Linguistics.
- Towards debiasing NLU models from unknown biases. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7597–7610, Online. Association for Computational Linguistics.
- Elena Voita and Ivan Titov. 2020. Information-theoretic probing with minimum description length. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 183–196, Online. Association for Computational Linguistics.
- Measuring and reducing gendered correlations in pre-trained models. arXiv preprint arXiv:2010.06032.
- Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340.
- Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics.
- Hadas Orgad (12 papers)
- Yonatan Belinkov (111 papers)