Fair prediction with disparate impact: A study of bias in recidivism prediction instruments (1610.07524v1)

Published 24 Oct 2016 in stat.AP, cs.CY, and stat.ML

Abstract: Recidivism prediction instruments provide decision makers with an assessment of the likelihood that a criminal defendant will reoffend at a future point in time. While such instruments are gaining increasing popularity across the country, their use is attracting tremendous controversy. Much of the controversy concerns potential discriminatory bias in the risk assessments that are produced. This paper discusses a fairness criterion originating in the field of educational and psychological testing that has recently been applied to assess the fairness of recidivism prediction instruments. We demonstrate how adherence to the criterion may lead to considerable disparate impact when recidivism prevalence differs across groups.

Authors (1)

Alexandra Chouldechova (46 papers)

Citations (2,001)

View on Semantic Scholar

Summary

Fair Prediction with Disparate Impact: An Analysis of Bias in Recidivism Prediction Instruments

The paper "Fair prediction with disparate impact: A study of bias in recidivism prediction instruments" authored by Alexandra Chouldechova examines the potential biases inherent in recidivism prediction instruments (RPIs). RPIs are used to evaluate the likelihood that a criminal defendant will reoffend in the future. Although these tools are widely implemented in various judiciary procedures, concerns about their fairness and potential discriminatory impacts have been increasingly discussed.

Introduction and Background

Recidivism prediction instruments are employed extensively within the criminal justice system to assist in pretrial decision-making, parole decisions, and occasionally even sentencing. However, the application of these tools has spurred controversy. The primary focus of the paper is to scrutinize how these instruments comply with fairness criteria, especially when recidivism rates differ among demographic groups. Specifically, the paper uses data from Broward County provided by ProPublica to analyze the COMPAS risk score, a commonly used RPI.

Main Contributions

The paper presents two central contributions:

It delineates the link between the psychometric definition of test fairness (well-calibration) and classification error rates.
It explicates how differing false positive and false negative rates between demographic groups can culminate in disparate impact, especially when high-risk assessments lead to more severe penalties.

Methodology and Results

Definition of Test Fairness

The paper employs the psychometric definition of test fairness, where a score is considered fair if it predicts the same likelihood of recidivism regardless of the group membership. Mathematically: $P(Y = 1 \mid S = s, R = b) = P(Y = 1 \mid S = s, R = w)$ This criterion ensures that the same risk score implies the same probability of recidivism for individuals from different racial groups.

Confounding Factors and Error Rates

By introducing a simplified coarsened score $S_c$ , dichotomizing the risk score into high-risk (HR) and low-risk (LR) categories, the paper demonstrates the inevitability of imbalanced false positive rates (FPR) and false negative rates (FNR) when recidivism prevalence varies across groups. Given different recidivism rates between Black and White defendants, the analysis concludes that the ProPublica report's findings of racial discrepancies in error rates are a natural consequence of applying a test-fair RPI to a heterogeneous population.

Disparate Impact

Chouldechova's analysis further shows that these discrepancies in error rates lead to tangible disparate impacts when RPIs are used to decide penalties. In particular, the disparities in FPR and FNR translate directly into differing average penalties, thereby affecting racial groups disproportionately even when the instrument itself is free from predictive bias.

Empirical Evidence

The empirical study uses the COMPAS dataset to illustrate the adherence of the COMPAS score to the test fairness criterion. Using regression techniques and plots of recidivism rates across risk scores, the paper confirms the predictive parity for Black and White individuals given the same score.

Implications and Future Directions

The study underscores the critical need to balance fairness across different error rates and questions the adequacy of solely relying on predictive parity (test fairness). The persistence of error disparities at finer levels underscores the need for broader and more nuanced fairness measures.

The implications of this study are both practical and theoretical. Policymakers and practitioners must be cognizant of the potential for legal and social disparities generated by RPIs and might need to consider additional fairness criteria beyond test fairness. Further research could explore the integration of fairness adjustments to RPIs that balance error rates while retaining predictive accuracy.

Conclusions

The paper provides a measured and empirically backed analysis of the limitations of current fairness standards in RPIs. The illustration of how fair instruments can still lead to disparate impacts broadens the conversation around fairness in machine learning and decision-making systems, particularly in the criminal justice system. Ensuring ethical applications of these instruments requires not only rigorous testing and validation but also evolving standards that account for the real-world implications of these predictive tools.

By illuminating the complexities and consequences of disparate error rates, this study offers a foundational step toward more equitable applications of AI and data-driven decision-making in judicial contexts.

PDF Markdown

Related Papers

Tweets

https://twitter.com/vk_wilde/status/1868253797077954560