Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination
The paper by Nathan Kallus, Xiaojie Mao, and Angela Zhou addresses a significant challenge in assessing algorithmic fairness: the unobserved nature of protected class membership, such as race or gender, in many datasets. This issue is prevalent in critical domains like lending and healthcare, where fairness evaluation is paramount but direct observations of protected classes are often missing due to legal or practical constraints. The authors propose using an auxiliary dataset, such as the US census, to infer the protected class via proxy variables, including surname and geolocation, to assess algorithmic fairness more effectively.
Key Contributions
- Problem Formulation:
- The authors formulate the assessment of algorithmic fairness with unobserved protected classes as a data combination problem involving two datasets: a primary dataset without protected class labels and an auxiliary dataset with protected class labels. This bifurcation is foundational for understanding the unidentifiability issues inherent in the problem.
- Identification Conditions:
- The paper provides a detailed analysis of the identifiability conditions under which fairness metrics, like disparate impact, are assessable from the available data. Identifiability is shown to be unachievable without making strong assumptions or having highly informative proxy data.
- Characterizing Partial Identification Sets:
- By employing optimization-based methodologies, the authors precisely characterize the partial identification sets for various disparity measures, allowing the determination of fairness limits under current data constraints.
- Methodology for Estimation and Inference:
- Through developing statistical tools that acknowledge the sampling uncertainties, the paper offers methods to compute the partial identification sets, which account for both uncertainty and ambiguity in fairness assessments.
- Empirical Applications:
- Real-world case studies in mortgage lending and personalized medicine demonstrate the approach's applicability, showcasing how one can use the presented methods to obtain robust fairness assessments in practical scenarios.
Practical Implications and Future Directions
This paper's methodologies serve as critical tools for industries where fairness is vital yet challenging to quantify due to incomplete data on protected classes. Deploying these methods can support reliable conclusions about disparities when direct observations of protected attributes are unavailable.
The research opens pathways for future works to enhance the robustness of fairness assessments in machine learning models. Future directions may include refining algorithms for inferring protected classes, incorporating more sophisticated proxy measures, or extending the methodologies to account for intersectional group analyses. This paper lays a foundation for tackling fairness in the absence of complete data, emphasizing the importance of transparent methodological frameworks.
In conclusion, the research by Kallus, Mao, and Zhou contributes significantly to the discourse on algorithmic fairness, proposing innovative solutions for challenges posed by unobserved protected classes. This work is indispensable for policymakers, practitioners, and researchers dedicated to advancing fairness in algorithmic decision-making without the complete demographic data that such assessments ideally require.