Bias and Fairness in Machine Learning

Updated 2 September 2025

Bias and fairness in machine learning are fields focused on diagnosing, quantifying, and mitigating systematic disparities in algorithmic predictions across protected demographics.
Research in this area employs data-level and model-level analyses to reveal how historical biases, sampling errors, and proxy features contribute to unfair decision-making.
Practical interventions include pre-processing, in-processing, and post-processing strategies that adjust data and models to promote equitable outcomes.

Bias and fairness in machine learning refer to the systematic differences in model performance or predictions across subpopulations defined by protected or sensitive attributes such as race, gender, age, or other social identifiers. Although algorithmic decision systems often promise objectivity, their data-driven nature and the sociotechnical context in which they operate make them susceptible to propagating historical inequities or creating new form of structural disadvantage. The field of fair machine learning has emerged to paper, quantify, and mitigate these risks, developing a comprehensive suite of definitions, metrics, algorithmic interventions, and theoretical frameworks to address persistent unfairness in automated decision making.

1. Sources and Mechanisms of Bias in Machine Learning

Bias in machine learning systems arises from both data and model-specific mechanisms. Data-level bias includes historical, representation, measurement, sampling, and aggregation biases, often introduced before or during data collection, preprocessing, and labeling. For example, historical data may reflect past societal prejudices or discriminatory practices, and measurement tools may yield systematically different quality or completeness across groups (Zhou et al., 2021, Pagano et al., 2022, Londoño et al., 2022). Label bias occurs when observed ground truth labels are themselves corrupt, often by group-dependent label flipping, as formalized in terms of conditional probabilities:

$\rho_A = P(Y = 1 | S = A, Z = 0), \quad \rho_B = P(Y = 0 | S = B, Z = 1)$

where $S$ denotes the sensitive attribute and $Z$ the true label (Zhang et al., 2023).

Model-level bias can occur during feature engineering (proxy discrimination), algorithm selection, loss function design, or by exploiting majority-minimization effects in empirical risk minimization. When models optimize for accuracy over skewed populations, errors in underrepresented or protected groups carry less weight, exacerbating disparities (Chouldechova et al., 2018). Overfit and complex models may amplify data-level artifacts, inadvertently encoding or intensifying existing prejudices (Zhou et al., 2021).

Another important mechanism is selection bias or systematic censoring, wherein outcomes for certain groups are observed only under restrictive or biased processes. If inclusion probabilities $P(S=1|Y, A)$ differ across protected groups, then fairness metrics computed conditionally on the selected data will not reflect true population-level fairness (Kallus et al., 2018).

2. Formal Fairness Definitions

Fairness in machine learning is formalized through competing statistical, individual, causal, and relative notions. Group fairness metrics evaluate the equivalence of error rates or positive prediction rates across groups defined by sensitive attributes:

Demographic/Statistical Parity: $P(\hat{Y}=1 | A=0) = P(\hat{Y}=1|A=1)$
Equalized Odds: $P(\hat{Y}=1 | Y=y, A=0) = P(\hat{Y}=1 | Y=y, A=1) \text{ for each } y$
Equal Opportunity: $P(\hat{Y}=1 | Y=1, A=0) = P(\hat{Y}=1|Y=1, A=1)$
Disparate Impact: $\min\left( \frac{P(\hat{Y}=1|A=0)}{P(\hat{Y}=1|A=1)}, \frac{P(\hat{Y}=1|A=1)}{P(\hat{Y}=1|A=0)} \right)$ , with the "80% rule" [0.8 threshold] widely used as a cut-off for group parity (Jones et al., 2020, Pagano et al., 2022).

Individual fairness is operationalized by the principle that similar individuals should be treated similarly, often via task-specific similarity metrics; however, its implementation depends upon the construction or estimation of meaningful and justifiable similarity functions (Mehrabi et al., 2019, Chouldechova et al., 2018).

Causal definitions formalize fairness via counterfactual or path-specific effects in graphical models, distinguishing between "fair" and "unfair" causal pathways (Oneto et al., 2020). For instance, a predictor is counterfactually fair if the outcome does not change under changes to the sensitive attribute along unfair causal paths.

Relative fairness, as in differential parity, compares two sets of decisions and requires their differences to be independent of sensitive attributes. It quantifies fairness as the equality of mean differences across groups, avoiding ambiguous absolute standards (Yu et al., 2021).

It is widely recognized that these notions are mutually incompatible except in trivial settings (impossibility theorems), necessitating context- and application-specific prioritization (Chouldechova et al., 2018, Caton et al., 2020).

3. Fairness Evaluation and Metrics

A rich suite of fairness metrics—derived from confusion matrices or statistical properties—enables quantitative assessment of unfairness:

Metric	Formal Expression	Captures
Statistical Parity Difference	$P(\hat{Y}=1 \| A=0) - P(\hat{Y}=1 \| A=1)$	Group-level selection bias
Average Odds Difference	$0.5\left( \|TPR_0 - TPR_1\| + \|FPR_0 - FPR_1\| \right)$	Error parity
Equal Opportunity Difference	$TPR_0 - TPR_1$	Sensitivity parity
Disparate Impact	$\min(\dotsc)$ (see above)	Ratio of positive rates

Individual metrics (e.g., kNN Consistency) and mutual information (e.g., $I(E;C)$ , the KL-divergence between joint and product distributions of classifier event $E$ and group $C$ ) quantify similarity of outcomes or information leakage (Mannelli et al., 2022).

Further, calibration—assessed by Brier score or group-wise predictive value alignment—and policy-agnostic metrics (e.g., fair efficiency integrating over threshold and fairness parameters) enable model selection when operational requirements are not predetermined (Jones et al., 2020).

4. Methods for Mitigation of Bias

Mitigation strategies are organized as pre-processing, in-processing, and post-processing interventions (Caton et al., 2020, Pagano et al., 2022):

Pre-processing: Adjust data via re-sampling, reweighting (often using inverse probability weighting, $w = 1 / P(S=1|X,A)$ ), massaging of labels, representation learning, or feature editing. Novel data audit frameworks filter pseudo-label noise or bias-inducing geometries, removing instances where protected attribute alone drives label differences (Chaudhari et al., 2022, Mannelli et al., 2022).
In-processing: Integrate fairness directly into learning objectives using regularizers or constraints (e.g., hard constraints enforcing fairness, soft constraints penalizing deviations, or adversarial debiasing where an adversary predicts the sensitive attribute from intermediate model representations). Meta-learning frameworks further adapt regularization weights dynamically, as in B-FARL, to reconcile group fairness and label noise (Zhang et al., 2021).
Post-processing: Calibrate, shift, or threshold predictions to enforce group parity, as in Reject Option Classification (randomizing predictions for near-boundary instances in unprivileged groups) or population sensitivity-guided threshold adjustments (Dang et al., 2022).

Causal and distributional alignment methods employ Bayesian network modeling or optimal transport (e.g., Wasserstein barycenter matching) to enforce higher-order matching of output distributions across sensitive groups (Oneto et al., 2020). In the presence of clustered data, mixed-effects models such as FMESVM jointly incorporate random (cluster) effects and fairness constraints to control bias in hierarchical structures (Burgard et al., 10 May 2024).

Algorithmic interventions are evaluated empirically for trade-offs between fairness (across multiple metrics), predictive power, calibration, and computational efficiency. No single mitigation yields universally optimal fairness–accuracy trade-offs; performance is context- and dataset-dependent (Jones et al., 2020, Caton et al., 2020).

5. Theoretical Insights and Impact of Data Geometry

Theoretical work provides conditions under which residual unfairness persists, notably due to selection bias or censoring mechanisms that differ across protected groups (Kallus et al., 2018). When censoring depends jointly on outcome and protected status ( $P(S=1|Y,A)$ varies over $A$ ), empirical fairness adjustments are insufficient—termed the “bias in, bias out” phenomenon—and fair estimates require explicit modeling or correction for the inclusion process.

Studies leveraging statistical physics and exactly solvable data models reveal that bias can originate solely from data geometry: imbalances in group representation, group-specific covariance structure, or teacher rule heterogeneity induce systematic error disparities even for idealized learners. Matched inference approaches—coupling models specialized for different subpopulations with elastic regularization—can achieve superior fairness–accuracy trade-offs by harnessing cross-group information (Mannelli et al., 2022).

On synthetic and real datasets (e.g., CelebA, MEPS, Adult, COMPAS), these models often reproduce and explain empirical phenomena such as observed disparities in error rates, disparate impact, or accuracy gaps, highlighting the interaction between data structure and algorithmic fairness (Pagano et al., 2022, Mannelli et al., 2022).

6. Applications, Libraries, and Practical Recommendations

Fairness considerations span a diversity of tasks: binary and multiclass classification, regression, ranking and recommendation, unsupervised learning, bandit and reinforcement learning, and robot learning (Caton et al., 2020, Londoño et al., 2022, Rashed et al., 13 Dec 2024). Real-world case studies include recidivism risk assessment (COMPAS), mental health prediction (Mosteiro et al., 2022, Dang et al., 2022), diabetic retinopathy screening (Raza et al., 2023), automated credit scoring, and gender/race bias in facial recognition and translation.

Open-source toolkits such as IBM AIF360, Microsoft Fairlearn, Aequitas, and TensorFlow Responsible AI provide practitioners with modular implementations of metrics, diagnostic workflows, and mitigation algorithms (pre-, in-, and post-processing) (Caton et al., 2020, Pagano et al., 2022, Rashed et al., 13 Dec 2024). These libraries differ in their support for multiclass settings, computational scalability, and available mitigation methods.

Practical recommendations include:

Conducting fairness audits (using multiple metrics) alongside performance evaluation during model development and deployment;
Incorporating fairness constraints or regularizers at the earliest stages possible;
Choosing and tuning mitigation strategies in application-specific contexts, with awareness of the inherent trade-offs between fairness, performance, and interpretability;
Ensuring representative data collection and robust outcome measurement, with explicit causal or censoring mechanism modeling in high-stakes domains;
Transparent reporting of model performance and fairness impact for all relevant subgroups, with context-dependent selection of operational fairness definitions (Dang et al., 2022, Raza et al., 2023, Jones et al., 2020).

7. Open Problems and Future Directions

Several grand challenges structure ongoing research. These include:

Lack of universally accepted, harmonized fairness definitions—group and individual metrics are often mutually incompatible, compelling nuanced prioritization (Caton et al., 2020, Chouldechova et al., 2018, Mehrabi et al., 2019);
Long-term and dynamic fairness: feedback loops and model/data drift may create or perpetuate unfairness over time, especially in complex or multi-component systems such as dynamic decision pipelines (Chouldechova et al., 2018, Pagano et al., 2022);
Explainability: the need for fairness metrics to be interpretable and actionable by diverse stakeholders, including those subject to regulatory or legal requirements (Zhou et al., 2021);
Black-box and federated settings: extending fairness-aware methodologies to models where internals are opaque or distributed across multiple agents (Pagano et al., 2022);
Integration of causal inference and counterfactual reasoning into standard ML pipelines for more robust and theoretically grounded debiasing (Oneto et al., 2020);
Standardization and benchmarking of metrics, interventions, and open datasets for reliable comparison and reproducibility (Pagano et al., 2022);
Addressing equity (providing what disadvantaged groups need to achieve equivalent outcomes) as distinct from pure equality in model outputs (Mehrabi et al., 2019).

The literature consistently underscores that technical interventions must be accompanied by ethical, legal, and social understanding, as all algorithmic fairness metrics and methods remain fundamentally contextual—depending on application, societal values, data properties, and structural constraints.

The field of bias and fairness in machine learning thus encompasses precise mathematical formalizations, algorithmic and statistical interventions, theoretical guarantees, and an expanding suite of practical tools, all responsive to fundamental challenges at the intersection of computation, society, and ethics. Persistent obstacles—such as residual unfairness due to data censoring, incompatibility of fairness metrics, or the role of data geometry—necessitate holistic approaches that unite rigorous modeling, ongoing monitoring, principled auditing, and socially-aware evaluation (Kallus et al., 2018, Caton et al., 2020, Mannelli et al., 2022).