Does Differential Privacy Impact Bias in Pretrained NLP Models? (2410.18749v1)
Abstract: Differential privacy (DP) is applied when fine-tuning pre-trained LLMs to limit leakage of training examples. While most DP research has focused on improving a model's privacy-utility tradeoff, some find that DP can be unfair to or biased against underrepresented groups. In this work, we show the impact of DP on bias in LLMs through empirical analysis. Differentially private training can increase the model bias against protected groups w.r.t AUC-based bias metrics. DP makes it more difficult for the model to differentiate between the positive and negative examples from the protected groups and other groups in the rest of the population. Our results also show that the impact of DP on bias is not only affected by the privacy protection level but also the underlying distribution of the dataset.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 308–318.
- Large-scale differentially private bert. arXiv preprint arXiv:2108.01624.
- Differential Privacy Has Disparate Impact on Model Accuracy. CoRR, abs/1905.12101.
- Nuanced metrics for measuring unintended bias with real data for text classification. In Companion proceedings of the 2019 world wide web conference, 491–500.
- Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), 2633–2650.
- On the Compatibility of Privacy and Fairness. In Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization, UMAP’19 Adjunct, 309–315. New York, NY, USA: Association for Computing Machinery. ISBN 9781450367110.
- Quantifying Social Biases in NLP: A Generalization and Empirical Comparison of Extrinsic Fairness Metrics. Transactions of the Association for Computational Linguistics, 9: 1249–1267.
- Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online, 25–35. Florence, Italy: Association for Computational Linguistics.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics.
- Measuring and Mitigating Unintended Bias in Text Classification.
- Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, 265–284. Springer.
- The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4): 211–407.
- Robin Hood and Matthew Effects–Differential Privacy Has Disparate Impact on Synthetic Data. arXiv preprint arXiv:2109.11429.
- Equality of opportunity in supervised learning. Advances in neural information processing systems, 29.
- Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application. arXiv preprint arXiv:2009.10277.
- Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, 43–53.
- Differentially Private Representation for NLP: Formal Guarantee and An Empirical Study on Privacy and Fairness. In Findings of the Association for Computational Linguistics: EMNLP 2020, 2355–2365.
- A general approach to adding differential privacy to iterative training procedures. arXiv preprint arXiv:1812.06210.
- Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification. CoRR, abs/2106.10826.
- Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7237–7256. Online: Association for Computational Linguistics.
- Benchmarking bias mitigation algorithms in representation learning through fairness metrics. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
- Gender Bias in Coreference Resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 8–14.
- How unfair is private learning ?
- Differentially Private Deep Learning under the Fairness Lens. CoRR, abs/2106.02674.
- DP-SGD vs PATE: Which Has Less Disparate Impact on Model Accuracy? arXiv preprint arXiv:2106.12576.
- Opacus: User-friendly differential privacy library in PyTorch. arXiv preprint arXiv:2109.12298.
- Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500.
- Gender Bias in Contextualized Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 629–634. Minneapolis, Minnesota: Association for Computational Linguistics.
- Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 15–20.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.