Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness (2005.12379v2)

Published 21 May 2020 in cs.LG, cs.SE, and stat.ML

Abstract: Machine learning models are increasingly being used in important decision-making software such as approving bank loans, recommending criminal sentencing, hiring employees, and so on. It is important to ensure the fairness of these models so that no discrimination is made based on protected attribute (e.g., race, sex, age) while decision making. Algorithms have been developed to measure unfairness and mitigate them to a certain extent. In this paper, we have focused on the empirical evaluation of fairness and mitigations on real-world machine learning models. We have created a benchmark of 40 top-rated models from Kaggle used for 5 different tasks, and then using a comprehensive set of fairness metrics, evaluated their fairness. Then, we have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance. We have found that some model optimization techniques result in inducing unfairness in the models. On the other hand, although there are some fairness control mechanisms in machine learning libraries, they are not documented. The mitigation algorithm also exhibit common patterns such as mitigation in the post-processing is often costly (in terms of performance) and mitigation in the pre-processing stage is preferred in most cases. We have also presented different trade-off choices of fairness mitigation decisions. Our study suggests future research directions to reduce the gap between theoretical fairness aware algorithms and the software engineering methods to leverage them in practice.

PDF Abstract

An Empirical Study on Bias in Machine Learning Models on Crowd-Sourced Platforms

The paper "Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness" by Sumon Biswas and Hridesh Rajan offers a meticulous empirical evaluation of bias in ML models from a practical standpoint. The authors provide a comprehensive analysis of fairness in ML models, leveraging a benchmark assembled from 40 models available on the Kaggle platform, across five distinct datasets. This endeavor addresses the critical need to understand biases inherent in practical ML deployments, with a focus on the implications of various bias mitigation strategies.

Key Study Aspects

The core of the paper revolves around evaluating fairness across a variety of ML models selected from Kaggle, taking into account multiple fairness metrics and mitigation techniques. The authors delineate three primary research questions:

Unfairness Prevalence: The extent to which existing ML models exhibit bias and the contributing factors.
Bias Mitigation: Strategies for identifying and addressing root causes of bias within ML models.
Impact Assessment: The effects of implementing various bias mitigation techniques on model performance.

Methodology

Benchmark Formation and Model Assessment:

The authors constructed a benchmark comprising ML models from Kaggle, aligned with datasets involving protected attributes (such as sex and age) and those historically engaged in fairness research. Each model was uniformly processed and evaluated using a standardized experimental setup. The datasets included German Credit, Adult Census, Bank Marketing, Home Credit, and Titanic ML datasets.

Comprehensive Fairness and Performance Evaluation:

The researchers employed a suite of fairness metrics—disparate impact, statistical parity difference, equal opportunity difference, and others—to quantify model bias. These metrics were calculated both before and after applying seven different bias mitigation algorithms, representing preprocessing, in-processing, and post-processing approaches.

Findings and Implications

Unveiling Bias:

The analyses disclosed that all investigated models displayed some degree of bias. Notably, the authors observed that models with optimization goals primarily focused on improving overall performance often embodied significant unfairness. Furthermore, library documentation on ML platforms typically sidelines fairness considerations, underscoring an area requiring attention.

Effectiveness of Mitigation Techniques:

The research highlighted that pre-processing techniques, specifically Reweighing, frequently yielded fairer models without sacrificing performance, especially in scenarios where models did not propagate inherent biases. Post-processing methods, though effective in mitigating biases, typically decreased model performance, suggesting a trade-off between fairness and efficacy.

Diverse Outcomes Across Metrics:

The paper accentuates the complexity of bias measurement, noting that models varied substantially in fairness across different metrics. This underscores the necessity of a multifaceted view of fairness, as reliance on a singular metric could mask or exaggerate certain biases.

Future Directions

This paper lays substantial groundwork for future investigations into the intersection of fair ML practices and model performance. The authors call for enhanced SE methodologies that bridge theoretical fairness algorithms with practical implementation, aspiring toward effective bias mitigation in real-world ML projects. Additionally, augmenting ML libraries to explicitly accommodate fairness considerations within model training processes could substantially aid developers in crafting unbiased ML systems.

In essence, this work shines a light on the pressing need to harmonize theoretical advancements in fairness with actionable tools and practices, enabling the production of equitable ML solutions in practical settings. The empirical findings provide a foundational step toward mitigating bias in ML, advancing fairness-centric SE research and development.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Sumon Biswas (8 papers)
Hridesh Rajan (33 papers)

Citations (87)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos