Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Are Bias Mitigation Techniques for Deep Learning Effective? (2104.00170v4)

Published 1 Apr 2021 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: A critical problem in deep learning is that systems learn inappropriate biases, resulting in their inability to perform well on minority groups. This has led to the creation of multiple algorithms that endeavor to mitigate bias. However, it is not clear how effective these methods are. This is because study protocols differ among papers, systems are tested on datasets that fail to test many forms of bias, and systems have access to hidden knowledge or are tuned specifically to the test set. To address this, we introduce an improved evaluation protocol, sensible metrics, and a new dataset, which enables us to ask and answer critical questions about bias mitigation algorithms. We evaluate seven state-of-the-art algorithms using the same network architecture and hyperparameter selection policy across three benchmark datasets. We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources. We use Biased MNIST and a visual question answering (VQA) benchmark to assess robustness to hidden biases. Rather than only tuning to the test set distribution, we study robustness across different tuning distributions, which is critical because for many applications the test distribution may not be known during development. We find that algorithms exploit hidden biases, are unable to scale to multiple forms of bias, and are highly sensitive to the choice of tuning set. Based on our findings, we implore the community to adopt more rigorous assessment of future bias mitigation methods. All data, code, and results are publicly available at: https://github.com/erobic/bias-mitigators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Bias-resilient neural network. ArXiv, abs/1910.03676, 2019.
  2. Towards causal VQA: revealing and reducing spurious correlations by invariant and covariant semantic editing. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 9687–9695. IEEE, 2020.
  3. Don’t just assume; look and answer: Overcoming priors for visual question answering. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 4971–4980. IEEE Computer Society, 2018.
  4. Bottom-up and top-down attention for image captioning and visual question answering. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 6077–6086. IEEE Computer Society, 2018.
  5. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  6. Learning de-biased representations with biased representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 528–539. PMLR, 2020.
  7. Big data’s disparate impact. Calif. L. Rev., 104:671, 2016.
  8. Alexis Bellot and Mihaela van der Schaar. Accounting for unobserved confounding in domain generalization. arXiv preprint arXiv:2007.10653, 2020.
  9. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
  10. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 4349–4357, 2016.
  11. RUBi: Reducing Unimodal Biases for Visual Question Answering. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 839–850, 2019.
  12. Artificial intelligence, bias and clinical safety. BMJ Quality & Safety, 28(3):231–237, 2019.
  13. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  14. An empirical study of invariant risk minimization. ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning, 2020.
  15. Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, 2017.
  16. Don’t take the easy way out: Ensemble based methods for avoiding known dataset biases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4069–4082, Hong Kong, China, 2019. Association for Computational Linguistics.
  17. Learning to model and ignore dataset bias with mixed capacity ensembles. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3031–3045, Online, 2020. Association for Computational Linguistics.
  18. On the learning dynamics of deep neural networks. arXiv preprint arXiv:1809.06848, 2018.
  19. Class-balanced loss based on effective number of samples. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 9268–9277. Computer Vision Foundation / IEEE, 2019.
  20. Use privacy in data-driven systems: Theory and experiments with machine learnt programs. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1193–1210, 2017.
  21. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations research, 58(3):595–612, 2010.
  22. Statistics of robust optimization: A generalized empirical likelihood approach. Mathematics of Operations Research, 2021.
  23. Multi-modal graph neural network for joint reasoning on vision and scene text. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 12743–12753. IEEE, 2020.
  24. Adversarial regularization for visual question answering: Strengths, shortcomings, and side effects. In Proceedings of the Second Workshop on Shortcomings in Vision and Language, pages 1–13, Minneapolis, Minnesota, 2019. Association for Computational Linguistics.
  25. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pages 1322–1328. IEEE, 2008.
  26. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263–1284, 2009.
  27. Unlearn dataset bias in natural language inference by fitting the residual. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 132–142, Hong Kong, China, 2019. Association for Computational Linguistics.
  28. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778. IEEE Computer Society, 2016.
  29. What shapes feature representations? exploring datasets, architectures, and training. NeurIPS, 2020.
  30. Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
  31. Does distributionally robust supervised learning give robust classifiers? In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 2034–2042. PMLR, 2018.
  32. GQA: A new dataset for real-world visual reasoning and compositional question answering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 6700–6709. Computer Vision Foundation / IEEE, 2019.
  33. Learning by abstraction: The neural state machine. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 5901–5914, 2019.
  34. Challenges and prospects in vision and language research. Frontiers in Artificial Intelligence, 2:28, 2019.
  35. Roses are red, violets are blue… but should vqa expect them to? arXiv preprint arXiv:2006.05121, 2020.
  36. Learning not to learn: Training deep neural networks with biased data. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 9012–9020. Computer Vision Foundation / IEEE, 2019.
  37. Out-of-distribution generalization via risk extrapolation (rex). arXiv preprint arXiv:2003.00688, 2020.
  38. Resound: Towards action recognition without representation bias. In Proceedings of the European Conference on Computer Vision (ECCV), pages 513–528, 2018.
  39. REPAIR: removing representation bias by dataset resampling. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 9572–9581. Computer Vision Foundation / IEEE, 2019.
  40. Focal loss for dense object detection. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2999–3007. IEEE Computer Society, 2017.
  41. An intriguing failing of convolutional neural networks and the coordconv solution. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 9628–9639, 2018.
  42. Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15:2018, 2018.
  43. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
  44. Nicolai Meinshausen. Causality from a distributional robustness point of view. In 2018 IEEE Data Science Workshop (DSW), pages 6–10. IEEE, 2018.
  45. Learning from failure: Training debiased classifier from biased classifier. In Advances in Neural Information Processing Systems, 2020.
  46. Stochastic gradient methods for distributionally robust optimization with f-divergences. In Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 2208–2216, 2016.
  47. Assessing and mitigating bias in medical artificial intelligence: the effects of race and ethnicity on a deep learning model for ecg analysis. Circulation: Arrhythmia and Electrophysiology, 13(3):e007988, 2020.
  48. Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society. Series B (Statistical Methodology), pages 947–1012, 2016.
  49. Gradient starvation: A learning proclivity in neural networks. arXiv preprint arXiv:2011.09468, 2020.
  50. Oskar Pfungst. Clever Hans:(the horse of Mr. Von Osten.) a contribution to experimental animal and human psychology. Holt, Rinehart and Winston, 1911.
  51. Overcoming language priors in visual question answering with adversarial regularization. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 1548–1558, 2018.
  52. The risks of invariant risk minimization. arXiv preprint arXiv:2010.05761, 2020.
  53. An explainable ai decision-support-system to automate loan underwriting. Expert Systems with Applications, 144:113100, 2020.
  54. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  55. An investigation of why overparameterization exacerbates spurious correlations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 8346–8356. PMLR, 2020.
  56. Learning from others’ mistakes: Avoiding dataset biases without modeling them. arXiv preprint arXiv:2012.01300, 2020.
  57. Taking a HINT: leveraging explanations to make vision and language models more grounded. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 2591–2600. IEEE, 2019.
  58. The pitfalls of simplicity bias in neural networks. arXiv preprint arXiv:2006.07710, 2020.
  59. Explainable and explicit visual reasoning over scene graphs. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 8376–8384. Computer Vision Foundation / IEEE, 2019.
  60. Answer them all! toward universal visual question answering models. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 10472–10481. Computer Vision Foundation / IEEE, 2019.
  61. A negative case analysis of visual grounding methods for VQA. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8172–8181, Online, July 2020. Association for Computational Linguistics.
  62. Unshuffling data for improved generalization. arXiv preprint arXiv:2002.11894, 2020.
  63. On the value of out-of-distribution testing: An example of goodhart’s law. 2020.
  64. Towards debiasing NLU models from unknown biases. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7597–7610, Online, 2020. Association for Computational Linguistics.
  65. Improving vqa and its explanations by comparing competing explanations. arXiv preprint arXiv:2006.15631, 2020.
  66. Risk variance penalization: From distributional robustness to causality. arXiv preprint arXiv:2006.07544, 2020.
  67. Neural-symbolic VQA: disentangling reasoning from vision and language understanding. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 1039–1050, 2018.
  68. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340, 2018.
  69. Generalized cross entropy loss for training deep neural networks with noisy labels. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 8792–8802, 2018.
  70. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2979–2989, Copenhagen, Denmark, 2017. Association for Computational Linguistics.
  71. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In Proceedings of the European conference on computer vision (ECCV), pages 289–305, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Robik Shrestha (14 papers)
  2. Kushal Kafle (22 papers)
  3. Christopher Kanan (72 papers)
Citations (12)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com