Equal Confusion Fairness: Measuring Group-Based Disparities in Automated Decision Systems (2307.00472v1)
Abstract: As artificial intelligence plays an increasingly substantial role in decisions affecting humans and society, the accountability of automated decision systems has been receiving increasing attention from researchers and practitioners. Fairness, which is concerned with eliminating unjust treatment and discrimination against individuals or sensitive groups, is a critical aspect of accountability. Yet, for evaluating fairness, there is a plethora of fairness metrics in the literature that employ different perspectives and assumptions that are often incompatible. This work focuses on group fairness. Most group fairness metrics desire a parity between selected statistics computed from confusion matrices belonging to different sensitive groups. Generalizing this intuition, this paper proposes a new equal confusion fairness test to check an automated decision system for fairness and a new confusion parity error to quantify the extent of any unfairness. To further analyze the source of potential unfairness, an appropriate post hoc analysis methodology is also presented. The usefulness of the test, metric, and post hoc analysis is demonstrated via a case study on the controversial case of COMPAS, an automated decision system employed in the US to assist judges with assessing recidivism risks. Overall, the methods and metrics provided here may assess automated decision systems' fairness as part of a more extensive accountability assessment, such as those based on the system accountability benchmark.
- J. Angwin, J. Larson, S. Mattu, and L. Kirchner, “Machine bias.” May 23, 2016. Accessed: May 19, 2022. [Online]. Available: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
- M. van Bekkum and F. Z. Borgesius, “Digital welfare fraud detection and the Dutch SyRI judgment,” European Journal of Social Security, vol. 23, no. 4, pp. 323–340, Dec. 2021, doi: 10.1177/13882627211031257.
- D. Castelvecchi, “Is facial recognition too biased to be let loose?,” Nature, vol. 587, no. 7834, pp. 347–349, Nov. 2020, doi: 10.1038/d41586-020-03186-4.
- M. Raghavan, S. Barocas, J. Kleinberg, and K. Levy, “Mitigating bias in algorithmic hiring: evaluating claims and practices,” in Proc. Conference on Fairness, Accountability, and Transparency, New York, NY, Jan. 27, 2020, pp. 469–481. doi: 10.1145/3351095.3372828.
- J. McLean and R. Mackenzie, “Digital justice in Australian visa application processes?,” Alternative Law Journal, vol. 44, no. 4, pp. 291–296, Dec. 2019, doi: 10.1177/1037969X19853685.
- A. Choudhury and O. Asan, “Role of artificial intelligence in patient safety outcomes: systematic literature review,” JMIR Medical Informatics, vol. 8, no. 7, p. e18599, Jul. 2020, doi: 10.2196/18599.
- F. Gursoy and I. A. Kakadiaris, “System cards for AI-based decision-making for public policy.” arXiv, Mar. 01, 2022. doi: 10.48550/arXiv.2203.04754.
- Information Commissioner’s Office, “Guidance on the AI auditing framework: draft guidance for consultation.” Information Commissioner’s Office, Feb. 2020. Accessed: Jan. 07, 2022. [Online]. Available: https://ico.org.uk/about-the-ico/ico-and-stakeholder-consultations/ico-consultation-on-the-draft-ai-auditing-framework-guidance-for-organisations/
- N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A survey on bias and fairness in machine learning,” ACM Comput. Surv., vol. 54, no. 6, p. 115:1-115:35, Jul. 2021, doi: 10.1145/3457607.
- M. Samorani, S. L. Harris, L. G. Blount, H. Lu, and M. A. Santoro, “Overbooked and overlooked: machine learning and racial bias in medical appointment scheduling,” Manufacturing & Service Operations Management, Aug. 2021, doi: 10.1287/msom.2021.0999.
- M. O. R. Prates, P. H. Avelar, and L. C. Lamb, “Assessing gender bias in machine translation: a case study with Google Translate,” Neural Computing and Applications, vol. 32, no. 10, pp. 6363–6381, May 2020, doi: 10.1007/s00521-019-04144-6.
- C. H. Chu, R. Nyrup, K. Leslie, J. Shi, A. Bianchi, A. Lyn, M. McNicholl, S. Khan, S. Rahimi, and A. Grenier, “Digital ageism: challenges and opportunities in artificial intelligence for older adults,” The Gerontologist, Jan. 2022, doi: 10.1093/geront/gnab167.
- M. Whittaker, M. Alper, O. College, L. Kaziunas, and M. R. Morris, “Disability, bias, and AI.” Nov. 2019. Accessed: May 19, 2022. [Online]. Available: https://ainowinstitute.org/disabilitybiasai-2019.pdf
- U. Peters, “Algorithmic political bias in artificial intelligence systems,” Philosophy & Technology, vol. 35, no. 2, p. 25, Mar. 2022, doi: 10.1007/s13347-022-00512-8.
- A. Abid, M. Farooqi, and J. Zou, “Persistent anti-Muslim bias in large language models,” in Proc. AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, Jul. 21, 2021, pp. 298–306. doi: 10.1145/3461702.3462624.
- K. Makhlouf, S. Zhioua, and C. Palamidessi, “On the applicability of machine learning fairness notions,” ACM SIGKDD Explorations Newsletter, vol. 23, no. 1, pp. 14–23, May 2021, doi: 10.1145/3468507.3468511.
- A. Castelnovo, R. Crupi, G. Greco, D. Regoli, I. G. Penco, and A. C. Cosentini, “A clarification of the nuances in the fairness metrics landscape,” Scientific Reports, vol. 12, no. 1, p. 4209, Mar. 2022, doi: 10.1038/s41598-022-07939-1.
- S. Verma and J. Rubin, “Fairness definitions explained,” in Proc. International Workshop on Software Fairness, New York, NY, May 29, 2018, pp. 1–7. doi: 10.1145/3194770.3194776.
- R. K. E. Bellamy, K. Dey, M. Hind, S. C. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilović, S. Nagar, K. N. Ramamurthy, J. Richards, D. Saha, P. Sattigeri, M. Singh, K. R. Varshney, and Y. Zhang, “AI Fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias,” IBM Journal of Research and Development, vol. 63, no. 4/5, p. 4:1-4:15, Jul. 2019, doi: 10.1147/JRD.2019.2942287.
- S. Segal, Y. Adi, B. Pinkas, C. Baum, C. Ganesh, and J. Keshet, “Fairness in the eyes of the data: certifying machine-learning models,” in Proc. AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, Jul. 21, 2021, pp. 926–935. doi: 10.1145/3461702.3462554.
- A. Chouldechova and A. Roth, “A snapshot of the frontiers of fairness in machine learning,” Communications of the ACM, vol. 63, no. 5, pp. 82–89, Apr. 2020, doi: 10.1145/3376898.
- T. Räz, “Group fairness: independence revisited,” in Proc. ACM Conference on Fairness, Accountability, and Transparency, New York, NY, Mar. 3, 2021, pp. 129–137. doi: 10.1145/3442188.3445876.
- R. Berk, H. Heidari, S. Jabbari, M. Kearns, and A. Roth, “Fairness in criminal justice risk assessments: the state of the art,” Sociological Methods & Research, vol. 50, no. 1, pp. 3–44, 2021, doi: 10.1177/0049124118782533.
- J. R. Foulds, R. Islam, K. Keya, and S. Pan, “An intersectional definition of fairness,” in Proc. International Conference on Data Engineering, Los Alamitos, CA, Apr. 2020, pp. 1918–1921. doi: 10.1109/ICDE48307.2020.00203.
- M. Kearns, S. Neel, A. Roth, and Z. S. Wu, “Preventing fairness gerrymandering: auditing and learning for subgroup fairness,” in Proc. International Conference on Machine Learning, Jul. 3, 2018, pp. 2564–2572. [Online]. Available: https://proceedings.mlr.press/v80/kearns18a.html
- B. W. Matthews, “Comparison of the predicted and observed secondary structure of T4 phage lysozyme,” Biochimica et Biophysica Acta (BBA) - Protein Structure, vol. 405, no. 2, pp. 442–451, Oct. 1975, doi: 10.1016/0005-2795(75)90109-9.
- C. J. Ferguson, “An effect size primer: a guide for clinicians and researchers,” Professional Psychology: Research and Practice, vol. 40, no. 5, pp. 532–538, 2009, doi: 10.1037/a0015808.
- D. M. Sharpe, “Your chi-square test is statistically significant: now what?,” Practical Assessment, Research and Evaluation, vol. 20, no. 8, pp. 1–10, 2015.
- S. J. Haberman, “The analysis of residuals in cross-classified tables,” Biometrics, vol. 29, no. 1, pp. 205–220, 1973, doi: 10.2307/2529686.
- P. L. MacDonald and R. C. Gardner, “Type I error rate comparisons of post hoc procedures for I x j chi-square tables,” Educational and Psychological Measurement, vol. 60, no. 5, pp. 735–754, Oct. 2000, doi: 10.1177/00131640021970871.
- W. G. Cochran, “Some methods for strengthening the common x² tests,” Biometrics, vol. 10, pp. 417–451, 1954, doi: 10.2307/3001616.
- M. K. Cox and C. H. Key, “Post hoc pair-wise comparisons for the chi-square test of homogeneity of proportions,” Educational and Psychological Measurement, vol. 53, no. 4, pp. 951–962, Dec. 1993, doi: 10.1177/0013164493053004008.
- K. Kirkpatrick, “It’s not the algorithm, it’s the data,” Communications of the ACM, vol. 60, no. 2, pp. 21–23, Jan. 2017, doi: 10.1145/3022181.
- J. Larson, S. Mattu, L. Kirchner, and J. Angwin, “How we analyzed the COMPAS recidivism algorithm.” Accessed: May 19, 2022. [Online]. Available: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
- E. Israni and E. Chang, “Algorithmic due process: mistaken accountability and attribution in State v. Loomis.” Aug. 31, 2017. Accessed: Apr. 14, 2022. [Online]. Available: https://jolt.law.harvard.edu/digest/algorithmic-due-process-mistaken-accountability-and-attribution-in-state-v-loomis-1
- Harvard Law Review, “State v. Loomis.” Mar. 10, 2017. Accessed: May 19, 2022. [Online]. Available: https://harvardlawreview.org/2017/03/state-v-loomis/
- “Loomis v. Wisconsin.” Jun. 26, 2017. Accessed: May 19, 2022. [Online]. Available: https://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/
- W. Dieterich, C. Mendoza, and T. Brennan, “COMPAS risk scales: demonstrating accuracy equity and predictive parity.” Jul. 08, 2016. Accessed: May 19, 2022. [Online]. Available: https://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf
- A. L. Washington, “How to argue with an algorithm: lessons from the COMPAS ProPublica debate,” The Colorado Technology Law Journal, vol. 17, no. 1, pp. 131–160, Apr. 2019.
- E. Jackson and C. Mendoza, “Setting the record straight: what the COMPAS core risk and need assessment is and is not.” Mar. 31, 2020. Accessed: May 19, 2022. [Online]. Available: https://hdsr.mitpress.mit.edu/pub/hzwo7ax4/
- ProPublica, “Data and analysis for ‘machine bias.’” May 20, 2022. Accessed: May 19, 2022. [Online]. Available: https://github.com/propublica/compas-analysis
- FBI, “Violent crime.” Accessed: Apr. 12, 2022. [Online]. Available: https://ucr.fbi.gov/crime-in-the-u.s/2010/crime-in-the-u.s.-2010/violent-crime/violent-crime
- Northpointe, “Practitioners guide to COMPAS.” 2012. Accessed: Apr. 14, 2022. [Online]. Available: https://njoselson.github.io/pdfs/FieldGuide2_081412.pdf
- Furkan Gursoy (12 papers)
- Ioannis A. Kakadiaris (28 papers)