Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Systematic analysis of the impact of label noise correction on ML Fairness (2306.15994v1)

Published 28 Jun 2023 in cs.LG and cs.CY

Abstract: Arbitrary, inconsistent, or faulty decision-making raises serious concerns, and preventing unfair models is an increasingly important challenge in Machine Learning. Data often reflect past discriminatory behavior, and models trained on such data may reflect bias on sensitive attributes, such as gender, race, or age. One approach to developing fair models is to preprocess the training data to remove the underlying biases while preserving the relevant information, for example, by correcting biased labels. While multiple label noise correction methods are available, the information about their behavior in identifying discrimination is very limited. In this work, we develop an empirical methodology to systematically evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets. Our methodology involves manipulating the amount of label noise and can be used with fairness benchmarks but also with standard ML datasets. We apply the methodology to analyze six label noise correction methods according to several fairness metrics on standard OpenML datasets. Our results suggest that the Hybrid Label Noise Correction method achieves the best trade-off between predictive performance and fairness. Clustering-Based Correction can reduce discrimination the most, however, at the cost of lower predictive performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. J. Xu, Y. Yang, and P. Yang, “Hybrid label noise correction algorithm for medical auxiliary diagnosis,” in 2020 IEEE 18th International Conference on Industrial Informatics (INDIN), vol. 1.   IEEE, 2020, pp. 567–572.
  2. B. Nicholson, J. Zhang, V. S. Sheng, and Z. Wang, “Label noise correction methods,” in 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).   IEEE, 2015, pp. 1–9.
  3. N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A survey on bias and fairness in machine learning,” ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1–35, 2021.
  4. A. Datta, M. C. Tschantz, and A. Datta, “Automated experiments on ad privacy settings: A tale of opacity, choice, and discrimination,” arXiv preprint arXiv:1408.6491, 2014.
  5. B. Frénay and M. Verleysen, “Classification in the presence of label noise: a survey,” IEEE transactions on neural networks and learning systems, vol. 25, no. 5, pp. 845–869, 2013.
  6. G. Algan and I. Ulusoy, “Image classification with deep learning in the presence of noisy labels: A survey,” Knowledge-Based Systems, vol. 215, p. 106771, 2021.
  7. J. Wang, Y. Liu, and C. Levy, “Fair classification with group-dependent label noise,” in Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, 2021, pp. 526–536.
  8. S. Wu, M. Gong, B. Han, Y. Liu, and T. Liu, “Fair classification with instance-dependent label noise,” in Conference on Causal Learning and Reasoning.   PMLR, 2022, pp. 927–943.
  9. H. Jiang and O. Nachum, “Identifying and correcting label bias in machine learning,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2020, pp. 702–712.
  10. R. Fogliato, A. Chouldechova, and M. G’Sell, “Fairness evaluation in presence of biased noisy labels,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2020, pp. 2325–2336.
  11. J.-w. Sun, F.-y. Zhao, C.-j. Wang, and S.-f. Chen, “Identifying and correcting mislabeled training instances,” in Future generation communication and networking (FGCN 2007), vol. 1.   IEEE, 2007, pp. 244–250.
  12. I. Triguero, J. A. Sáez, J. Luengo, S. García, and F. Herrera, “On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification,” Neurocomputing, vol. 132, pp. 30–41, 2014.
  13. W. Feng and S. Boukir, “Class noise removal and correction for image classification using ensemble margin,” in 2015 IEEE International Conference on Image Processing (ICIP).   IEEE, 2015, pp. 4698–4702.
  14. S. Basu, “Semi-supervised clustering by seeding,” in Proc. ICML-2002, 2002.
  15. A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of the eleventh annual conference on Computational learning theory, 1998, pp. 92–100.
  16. J. A. Hanley and B. J. McNeil, “The meaning and use of the area under a receiver operating characteristic (roc) curve.” Radiology, vol. 143, no. 1, pp. 29–36, 1982.
  17. C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” in Proceedings of the 3rd innovations in theoretical computer science conference, 2012, pp. 214–226.
  18. M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,” Advances in neural information processing systems, vol. 29, 2016.
  19. A. Chouldechova, “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,” Big data, vol. 5, no. 2, pp. 153–163, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. I. Oliveira e Silva (1 paper)
  2. C. Soares (2 papers)
  3. I. Sousa (3 papers)
  4. R. Ghani (1 paper)
Citations (1)

Summary

We haven't generated a summary for this paper yet.