An Empirical Study of Rich Subgroup Fairness for Machine Learning (1808.08166v1)

Published 24 Aug 2018 in cs.LG and stat.ML

Abstract: Kearns et al. [2018] recently proposed a notion of rich subgroup fairness intended to bridge the gap between statistical and individual notions of fairness. Rich subgroup fairness picks a statistical fairness constraint (say, equalizing false positive rates across protected groups), but then asks that this constraint hold over an exponentially or infinitely large collection of subgroups defined by a class of functions with bounded VC dimension. They give an algorithm guaranteed to learn subject to this constraint, under the condition that it has access to oracles for perfectly learning absent a fairness constraint. In this paper, we undertake an extensive empirical evaluation of the algorithm of Kearns et al. On four real datasets for which fairness is a concern, we investigate the basic convergence of the algorithm when instantiated with fast heuristics in place of learning oracles, measure the tradeoffs between fairness and accuracy, and compare this approach with the recent algorithm of Agarwal et al. [2018], which implements weaker and more traditional marginal fairness constraints defined by individual protected attributes. We find that in general, the Kearns et al. algorithm converges quickly, large gains in fairness can be obtained with mild costs to accuracy, and that optimizing accuracy subject only to marginal fairness leads to classifiers with substantial subgroup unfairness. We also provide a number of analyses and visualizations of the dynamics and behavior of the Kearns et al. algorithm. Overall we find this algorithm to be effective on real data, and rich subgroup fairness to be a viable notion in practice.

Authors (4)

Michael Kearns (65 papers)
Seth Neel (27 papers)
Aaron Roth (138 papers)
Zhiwei Steven Wu (143 papers)

Citations (192)

View on Semantic Scholar

Summary

An Empirical Study of Rich Subgroup Fairness for Machine Learning

The paper conducted by Kearns et al. centers on the concept of rich subgroup fairness within the domain of machine learning and its potential to mitigate fairness concerns that typical statistical definitions of fairness might not adequately address. Traditional fairness metrics often consider fairness only among broadly defined protected groups, such as racial or gender groups, based on aggregate statistics like false positive rates. However, this methodology can overlook disparities within smaller, more nuanced subgroups—a phenomenon described in the literature as "fairness gerrymandering." This paper advances previous theoretical work by evaluating the practical effectiveness of rich subgroup fairness on real datasets using an algorithm proposed by Kearns and his colleagues.

Rich subgroup fairness aims to enforce fairness constraints over a large, often combinatorial number of subgroups defined by classifiers with bounded VC dimensions. This approach seeks to ensure that statistical fairness constraints, such as equal false positive rates, hold across all significant intersections of protected attributes, creating a potential for more robust and individualized fairness guarantees.

The paper conducts an empirical evaluation of the algorithm from Kearns et al., testing its performance on four datasets where fairness is a legitimate concern: Communities and Crime, Law School, Student Performance, and Adult Income. The objective is to determine how well the algorithm converges towards its goals, the trade-offs it presents between fairness and accuracy, and its comparative efficacy against algorithms that focus only on marginal fairness constraints.

Key Empirical Findings

Convergence Properties: The paper found that despite using heuristic oracles in place of perfect learning oracles—a deviation from the theoretical ideal—the algorithm generally achieves quick convergence on various datasets. The behaviors differ, but especially on the Communities and Crime dataset, the algorithm quickly reduced subgroup unfairness with minor sacrifices in accuracy.
Trade-offs Between Fairness and Accuracy: Across the datasets, the algorithm effectively balances rich subgroup fairness with accuracy, achieving desirable trade-offs that suggest it is feasible to enforce these fairness constraints without significantly sacrificing classification performance. For instance, in the Communities and Crime dataset, subgroup unfairness was reduced drastically with minimal impact on error rates.
Comparison with Marginal Fairness: The comparison with marginal fairness approaches highlighted that solely optimizing for standard marginal fairness can result in models that suffer from rich subgroup unfairness. The $MARGINAL$ algorithm failed to achieve the same levels of subgroup fairness as the $SUBGROUP$ algorithm in most datasets, underscoring the necessity of the more comprehensive approach advocated by Kearns et al.
Dynamics and Visualization: The paper utilized discrimination surface heatmaps and trajectory plots to visualize the algorithm’s dynamics, illustrating how subgroup unfairness is systematically reduced over iterations. The visualization provides a clear depiction of the reduction in unfair classification rates across small, identifiable subgroups.

Implications and Future Directions

The practical effectiveness of rich subgroup fairness, as demonstrated in the paper, points towards significant implications for both theory and practice in fair machine learning. In theory, it challenges existing reliance on marginal fairness constraints by offering a more holistic approach that accounts for potential subgroup-level biases. Practically, this research suggests that stakeholders in fields such as criminal justice or financial services, where decisions are sensitive to fairness, should consider adopting algorithms designed to address subgroup disparities.

Future research could explore expanding the classifier complexity from linear thresholds to richer hypothesis spaces, which may further optimize the balance between fairness and accuracy. Additionally, investigating how these theoretical findings generalize and perform on test datasets could provide insights into the algorithm’s robustness and practical applicability in live systems.

In conclusion, this paper provides a compelling argument for the adoption of rich subgroup fairness in machine learning. It substantiates previous theoretical claims with empirical evidence, demonstrating that addressing subgroup fairness is not only necessary but also a viable goal that can be attained with reasonable computational and accuracy costs. The paper contributes valuable insights for advancing fair AI practices and developing enhanced fairness-aware algorithms.

PDF Markdown

Related Papers

Find Related Papers