Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction (1807.00199v1)

Published 30 Jun 2018 in cs.LG and stat.ML

Abstract: Recidivism prediction scores are used across the USA to determine sentencing and supervision for hundreds of thousands of inmates. One such generator of recidivism prediction scores is Northpointe's Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) score, used in states like California and Florida, which past research has shown to be biased against black inmates according to certain measures of fairness. To counteract this racial bias, we present an adversarially-trained neural network that predicts recidivism and is trained to remove racial bias. When comparing the results of our model to COMPAS, we gain predictive accuracy and get closer to achieving two out of three measures of fairness: parity and equality of odds. Our model can be generalized to any prediction and demographic. This piece of research contributes an example of scientific replication and simplification in a high-stakes real-world application like recidivism prediction.

Citations (173)

View on Semantic Scholar

Summary

The paper introduces an adversarial model that improves fairness and accuracy by mitigating bias in recidivism predictions.
It demonstrates significant reductions in High Risk, False Positive, and False Negative gaps, thereby enhancing parity and equality of odds.
The study shows the model achieving an ROC AUC of 0.70 versus COMPAS's 0.66, while noting remaining calibration challenges.

Achieving Fairness through Adversarial Learning in Recidivism Prediction

The paper presents a paper on the application of adversarial learning to achieve fairness in the context of recidivism prediction, an exercise marked by the evaluation of bias in the widely-utilized COMPAS score. The authors propose the use of an adversarially-trained neural network that not only predicts recidivism but also incorporates mechanisms to remove racial bias. The paper is grounded in responses to racial discrepancies identified in the COMPAS predictions, particularly its alleged bias against black inmates, which poses crucial ethical challenges due to its role in influencing judicial decisions.

The authors deploy an adversarial neural network to improve both the accuracy and fairness of recidivism predictions. By training the network to predict recidivism while concurrently neutralizing racial bias, the authors achieve improvements over COMPAS scores in terms of parity and equality of odds—a significant quantitative claim supported by comprehensive data comparisons. Specifically, they observe a significant reduction in the High Risk Gap, False Positive Gap, and False Negative Gap between white and black inmates in their model predictions. Furthermore, their model achieves an area under the ROC curve of 0.70, surpassing the 0.66 of the COMPAS scores. However, the paper notes some calibration bias in the adversarial model compared to COMPAS's calibration standards.

The adversarial methodology employed focuses on the separation of prediction accuracy from demographic predictors, specifically targeting representational parity and ensuring disparate impact avoidance across sensitive attributes like race. This paper corroborates past research advocating adversarial strategies as efficacious tools for mitigating fairness issues within machine learning models by demonstrating this approach's applicability in high-stakes environments like judicial risk assessments.

This research adds valuable insights into existing methodologies; it further validates previous work by Hardt et al. and follows foundational precepts outlined by researchers such as Beutel et al., incorporating both theoretical definitions and practical applications of fairness measures. The adversarial model presented not only meets the parity and accuracy challenges but enhances predictive rigor in contexts plagued by historical bias—relevant in models where balanced procedural fairness is critical.

While the authors acknowledge reaching close adherence to parity and equality odds, their model's deviation from fair calibration underscores potential areas for enhancement. The authors advocate for holistic feature usage in adversarial models that rely on broader datasets, as evidenced by the feature importance analysis showing diverse input reliance compared to stark dependencies in conventional models.

In summation, this paper posits adversarial learning not just as a theoretical ideal but as a workable solution for immediate concerns regarding fairness in AI-driven predictions, particularly recidivism. As research progresses, future directions could involve extending these techniques to other fairness paradigms and applying them across varied domains to evaluate broader impacts, while maintaining or further improving predictive accuracy and fairness metrics. This approach holds promise for guiding AI's role responsibly in sensitive real-world applications, demonstrating the continuing need for fairness-centric model designs and rigorous evaluations in combating inequalities propagated by algorithmic processes.

PDF Markdown

Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction (1807.00199v1)

Summary

Achieving Fairness through Adversarial Learning in Recidivism Prediction

Related Papers