A Large-Scale Empirical Study on Improving the Fairness of Image Classification Models (2401.03695v2)
Abstract: Fairness has been a critical issue that affects the adoption of deep learning models in real practice. To improve model fairness, many existing methods have been proposed and evaluated to be effective in their own contexts. However, there is still no systematic evaluation among them for a comprehensive comparison under the same context, which makes it hard to understand the performance distinction among them, hindering the research progress and practical adoption of them. To fill this gap, this paper endeavours to conduct the first large-scale empirical study to comprehensively compare the performance of existing state-of-the-art fairness improving techniques. Specifically, we target the widely-used application scenario of image classification, and utilized three different datasets and five commonly-used performance metrics to assess in total 13 methods from diverse categories. Our findings reveal substantial variations in the performance of each method across different datasets and sensitive attributes, indicating over-fitting on specific datasets by many existing methods. Furthermore, different fairness evaluation metrics, due to their distinct focuses, yield significantly different assessment results. Overall, we observe that pre-processing methods and in-processing methods outperform post-processing methods, with pre-processing methods exhibiting the best performance. Our empirical study offers comprehensive recommendations for enhancing fairness in deep learning models. We approach the problem from multiple dimensions, aiming to provide a uniform evaluation platform and inspire researchers to explore more effective fairness solutions via a set of implications.
- A Reductions Approach to Fair Classification. In ICML 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 60–69. http://proceedings.mlr.press/v80/agarwal18a.html
- Black box fairness testing of machine learning models. In ESEC/SIGSOFT FSE 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 625–635.
- Why resampling outperforms reweighting for correcting sampling bias with stochastic gradients. In ICLR 2021. OpenReview.net.
- Sumon Biswas and Hridesh Rajan. 2020. Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness. In ESEC/FSE 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 642–653.
- Sumon Biswas and Hridesh Rajan. 2021. Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline. In ESEC/FSE 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 981–993.
- The Mathematics of Statistical Machine Translation: Parameter Estimation. Comput. Linguistics 19, 2 (1993), 263–311.
- Yuriy Brun and Alexandra Meliou. 2018. Software fairness. In ESEC/FSE 2018, Gary T. Leavens, Alessandro Garcia, and Corina S. Pasareanu (Eds.). ACM, 754–759.
- A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks 106 (2018), 249–259.
- Jonathon Byrd and Zachary Chase Lipton. 2019. What is the Effect of Importance Weighting in Deep Learning?. In ICML 2019 (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 872–881.
- Optimized Pre-Processing for Discrimination Prevention. In NeurIPS 2017, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 3992–4001.
- The zoo of Fairness metrics in Machine Learning. CoRR abs/2106.00467 (2021).
- Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees. In FAccT 2019, danah boyd and Jamie H. Morgenstern (Eds.). ACM, 319–328.
- Bias in machine learning software: why? how? what to do?. In ESEC/FSE 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 429–440.
- Fairway: a way to build fair ML software. In ESEC/FSE 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 654–665.
- Software Engineering for Fairness: A Case Study with Hyperparameter Optimization. CoRR abs/1905.05786 (2019).
- Explaining Neural Networks Semantically and Quantitatively. In ICCV 2019. IEEE, 9186–9195.
- Fairness Improvement with Multiple Protected Attributes: How Far Are We?. In ICSE 2024. ACM.
- Fairness Testing: A Comprehensive Survey and Analysis of Trends. CoRR abs/2207.10223 (2022).
- MAAT: a novel ensemble approach to addressing fairness and performance bugs for machine learning software. In ESEC/FSE 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 1122–1134.
- David Chiang. 2005. A Hierarchical Phrase-Based Model for Statistical Machine Translation. In ACL 2005, Kevin Knight, Hwee Tou Ng, and Kemal Oflazer (Eds.). The Association for Computer Linguistics, 263–270.
- Fairness through awareness. In ITCS 2012, Shafi Goldwasser (Ed.). ACM, 214–226.
- Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015, Longbing Cao, Chengqi Zhang, Thorsten Joachims, Geoffrey I. Webb, Dragos D. Margineantu, and Graham Williams (Eds.). ACM, 259–268.
- ”Fairness Analysis” in Requirements Assignments. In RE 2008. IEEE Computer Society, 115–124.
- A comparative study of fairness-enhancing interventions in machine learning. In FAccT 2019, danah boyd and Jamie H. Morgenstern (Eds.). ACM, 329–338.
- Fairness testing: testing software for discrimination. In ESEC/FSE 2017, Eric Bodden, Wilhelm Schäfer, Arie van Deursen, and Andrea Zisman (Eds.). ACM, 498–510.
- Jorge Galindo and Pablo Tamayo. 2000. Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Computational economics 15 (2000), 107–143.
- Rafael C. González and Richard E. Woods. 2008. Digital image processing, 3rd Edition. Pearson Education.
- Marrying Fairness and Explainability in Supervised Learning. In FAccT 2022. ACM, 1905–1916.
- Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–24.
- FairRec: Fairness Testing for Deep Recommender Systems. In ISSTA 2023, René Just and Gordon Fraser (Eds.). ACM, 310–321.
- Equality of Opportunity in Supervised Learning. In NeurIPS 2016, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 3315–3323.
- AI-enabled exploration of Instagram profiles predicts soft skills and personality traits to empower hiring decisions. arXiv preprint arXiv:2212.07069 (2022).
- Deep Residual Learning for Image Recognition. In CVPR 2016. IEEE Computer Society, 770–778.
- Youngkyu Hong and Eunho Yang. 2021. Unbiased Classification through Bias-Contrastive and Bias-Balanced Learning. In NeurIPS 2021, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 26449–26461.
- Aziz Z Huq. 2018. Racial equity in algorithmic criminal justice. Duke LJ 68 (2018), 1043.
- Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intell. Data Anal. 6, 5 (2002), 429–449.
- A Post-Training Framework for Improving the Performance of Deep Learning Models via Model Transformation. ACM Transactions on Software Engineering and Methodology (2023).
- Frederick P. Brooks Jr. 1987. No Silver Bullet - Essence and Accidents of Software Engineering. Computer 20, 4 (1987), 10–19.
- Learning Fair Classifiers with Partially Annotated Group Labels. In CVPR 2022. IEEE, 10338–10347.
- Fair Feature Distillation for Visual Recognition. In CVPR 2021. Computer Vision Foundation / IEEE, 12115–12124.
- Faisal Kamiran and Toon Calders. 2011. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1 (2011), 1–33.
- Decision Theory for Discrimination-Aware Classification. In ICDM 2012, Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu (Eds.). IEEE Computer Society, 924–929.
- Fairness-Aware Classifier with Prejudice Remover Regularizer. In ECML PKDD 2012 (Lecture Notes in Computer Science, Vol. 7524), Peter A. Flach, Tijl De Bie, and Nello Cristianini (Eds.). Springer, 35–50.
- Null-Sampling for Interpretable and Fair Representations. In ECCV 2020 (Lecture Notes in Computer Science, Vol. 12371), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 565–580.
- Multiaccuracy: Black-Box Post-Processing for Fairness in Classification. In AIES 2019, Vincent Conitzer, Gillian K. Hadfield, and Shannon Vallor (Eds.). ACM, 247–254.
- Philipp Koehn. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In MTSummit 2005. 79–86.
- Moses: Open Source Toolkit for Statistical Machine Translation. In ACL 2007, John Carroll, Antal van den Bosch, and Annie Zaenen (Eds.). The Association for Computational Linguistics.
- Training Data Debugging for the Fairness of Machine Learning Software. In ICSE 2022. ACM, 2215–2227.
- FairGRAPE: Fairness-Aware GRAdient Pruning mEthod for Face Attribute Classification. In ECCV 2022 (Lecture Notes in Computer Science, Vol. 13673), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer, 414–432.
- Bias Mitigation Post-processing for Individual and Group Fairness. In ICASSP 2019. IEEE, 2847–2851.
- FairALM: Augmented Lagrangian Method for Training Fair Models with Little Regret. In ECCV 2020 (Lecture Notes in Computer Science, Vol. 12357), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 365–381.
- Last-Layer Fairness Fine-tuning is Simple and Effective for Neural Networks. CoRR abs/2304.03935 (2023).
- A Survey on Bias and Fairness in Machine Learning. ACM Comput. Surv. 54, 6 (2022), 115:1–115:35.
- Diversity in Faces. CoRR abs/1901.10436 (2019).
- Fix Fairness, Don’t Ruin Accuracy: Performance Aware Fairness Repair using AutoML. In ESEC/FSE 2023, Satish Chandra, Kelly Blincoe, and Paolo Tonella (Eds.). ACM, 502–514.
- Automatic Shortlisting of Candidates in Recruitment.. In ProfS/KG4IR/Data: Search@ SIGIR. 5–11.
- Fair Contrastive Learning for Facial Attribute Classification. In CVPR 2022. IEEE, 10379–10388.
- A Combinatorial Approach to Fairness Testing of Machine Learning Models. In ICST Workshops 2022. IEEE, 94–101.
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
- On Fairness and Calibration. In NeurIPS 2017, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5680–5689.
- Bias Mimicking: A Simple Sampling Approach for Bias Mitigation. CoRR abs/2209.15605 (2022).
- Discovering Fair Representations in the Data Domain. In CVPR 2019. Computer Vision Foundation / IEEE, 8227–8236.
- Fair Attribute Classification Through Latent Space De-Biasing. In CVPR 2021. Computer Vision Foundation / IEEE, 9301–9310.
- The artificial intelligence recruitment process: How technological advancements have reshaped job application and selection practices. Psychosociological Issues in Human Resource Management 7, 1 (2019), 42–47.
- Values in emotion artificial intelligence hiring services: Technosolutions to organizational problems. Proceedings of the ACM on Human-Computer Interaction 7, CSCW1 (2023), 1–28.
- FairBatch: Batch Selection for Model Fairness. In ICLR 2021. OpenReview.net.
- W. W. Royce. 1987. Managing the Development of Large Software Systems: Concepts and Techniques. In ICSE 1987, William E. Riddle, Robert M. Balzer, and Kouichi Kishida (Eds.). ACM Press, 328–339.
- Kari Saarinen. 1994. Image processing, analysis and machine vision : by Milan Sonka, Vaclav Hlavac and Roger Boyle. Publisher: Chapman & Hall, 2-6 Boundary Row, London SE1 8HN, UK, 1993, xix+555 pp., ISBN 0-412-45570-6. Signal Process. 35, 1 (1994), 102–104.
- Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization. CoRR abs/1911.08731 (2019).
- An Investigation of Why Overparameterization Exacerbates Spurious Correlations. CoRR abs/2005.04345 (2020).
- Interventional Fairness: Causal Database Repair for Algorithmic Fairness. In SIGMOD 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 793–810.
- FLAC: Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations. CoRR abs/2304.14252 (2023).
- Interpretable Compositional Convolutional Neural Networks. In IJCAI 2021, Zhi-Hua Zhou (Ed.). ijcai.org, 2971–2978.
- Astraea: Grammar-Based Fairness Testing. IEEE Trans. Software Eng. 48, 12 (2022), 5188–5211.
- Maya Steinitz. 2014. The case for an international court of Civil Justice. Stan. L. Rev. Online 67 (2014), 75.
- Causality-Based Neural Network Repair. In ICSE 2022. ACM, 338–349.
- Fairness-aware Configuration of Machine Learning Libraries. In ICSE 2022. ACM, 909–920.
- Simultaneous Deep Transfer Across Domains and Tasks. In ICCV 2015. IEEE Computer Society, 4068–4076.
- Job candidates’ reactions to AI-enabled job application processes. AI and Ethics 1 (2021), 119–130.
- Marketing AI recruitment: The next phase in job application and selection. Computers in Human Behavior 90 (2019), 215–222.
- Increasing trust and fairness in machine learning applications within the mortgage industry. Machine Learning with Applications 10 (2022), 100406.
- A comparative assessment of ensemble learning for credit scoring. Expert systems with applications 38, 1 (2011), 223–230.
- Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models. In CVPR 2022. IEEE, 10369–10378.
- Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation. In CVPR 2020. Computer Vision Foundation / IEEE, 8916–8925.
- Fairlearn: Assessing and Improving Fairness of AI Systems. CoRR abs/2303.16626 (2023).
- Wikipedia contributors. 2023. Pearson correlation coefficient — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Pearson_correlation_coefficient&oldid=1177721736.
- Consistent Instance False Positive Improves Fairness in Face Recognition. In CVPR 2021. Computer Vision Foundation / IEEE, 578–586.
- Zichun Xu. 2022. Human Judges in the era of artificial intelligence: challenges and opportunities. Applied Artificial Intelligence 36, 1 (2022), 2013652.
- Improving Fairness in Image Classification via Sketching. CoRR abs/2211.00168 (2022).
- Mitigating Unwanted Biases with Adversarial Learning. In AIES 2018, Jason Furman, Gary E. Marchant, Huw Price, and Francesca Rossi (Eds.). ACM, 335–340.
- Fairness Reprogramming. In NeurIPS.
- Fair Decision Making via Automated Repair of Decision Trees. In 2nd IEEE/ACM International Workshop on Equitable Data & Technology, FairWare@ICSE 2022, Pittsburgh, PA, USA, May 9, 2022. ACM / IEEE, 9–16.
- Mengdi Zhang and Jun Sun. 2022. Adaptive fairness improvement based on causality analysis. In ESEC/FSE 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 6–17.
- Yi Zhang and Jitao Sang. 2020. Towards Accuracy-Fairness Paradox: Adversarial Example-based Data Augmentation for Visual Debiasing. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 4346–4354.
- Junjie Yang (74 papers)
- Jiajun Jiang (15 papers)
- Zeyu Sun (33 papers)
- Junjie Chen (89 papers)