De-Biasing the Bias: Methods for Improving Disparity Assessments with Noisy Group Measurements
Abstract: Health care decisions are increasingly informed by clinical decision support algorithms, but these algorithms may perpetuate or increase racial and ethnic disparities in access to and quality of health care. Further complicating the problem, clinical data often have missing or poor quality racial and ethnic information, which can lead to misleading assessments of algorithmic bias. We present novel statistical methods that allow for the use of probabilities of racial/ethnic group membership in assessments of algorithm performance and quantify the statistical bias that results from error in these imputed group probabilities. We propose a sensitivity analysis approach to estimating the statistical bias that allows practitioners to assess disparities in algorithm performance under a range of assumed levels of group probability error. We also prove theoretical bounds on the statistical bias for a set of commonly used fairness metrics and describe real-world scenarios where our theoretical results are likely to apply. We present a case study using imputed race and ethnicity from the Bayesian Improved Surname Geocoding (BISG) algorithm for estimation of disparities in a clinical decision support algorithm used to inform osteoporosis treatment. Our novel methods allow policy makers to understand the range of potential disparities under a given algorithm even when race and ethnicity information is missing and to make informed decisions regarding the implementation of machine learning for clinical decision support.
- Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research 50, 3–44.
- Trends in missing race and ethnicity information after imputation in healthcare. gov marketplace enrollment data, 2015-2021. JAMA Network Open 5, e2216715–e2216715.
- Predicting race and ethnicity to ensure equitable algorithms for health care decision making. Health Affairs 41, 1153–1159.
- Validation of the simple calculated osteoporosis risk estimation (score) for patient selection for bone densitometry. Osteoporosis International 10, 85–90.
- CDEI (2023). Enabling responsible access to demographic data to make AI systems fairer. Technical report, United Kingdom Centre for Data Ethics and Innovation.
- Sources of bias in artificial intelligence that perpetuate healthcare disparities—a global review. PLOS Digital Health 1, e0000022.
- Fairness Under Unawareness: Assessing Disparity When Protected Class Is Unobserved. In FAT* ’19: Conference on Fairness, Accountability, and Transparency, pages 339–348, Atlanta, GA, USA. Association for Computing Machinery.
- Using the census bureau’s surname list to improve estimates of race/ethnicity and associated disparities. Health Services and Outcomes Research Methodology 9, 69–83.
- Examining race and ethnicity information in medicare administrative data. Medical care 55, e170–e176.
- Equality of opportunity in supervised learning. Advances in neural information processing systems 29,.
- Improving Data on Race and Ethnicity: A Roadmap to Measure and Advance Health Equity. Technical report, Grantmakers in Health and NCQA.
- An algorithm for removing sensitive information. The Annals of Applied Statistics 13, 189–220.
- Data on race, ethnicity, and language largely incomplete for managed care plan members. Health Affairs 36, 548–552.
- Racial and ethnic disparities in bone health and outcomes in the united states. Journal of Bone and Mineral Research 36, 1881–1905.
- Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453.
- Office of the Inspector General (2022). Data Brief: Inaccuracies in Medicare’s Race and Ethnicity Data Hinder the Ability To Assess Health Disparities. OEI-02-21-00100. U.S. Department of Health and Human Services.
- Comparing diagnostic tests on benefit-risk. Journal of biopharmaceutical statistics 26, 1083–1097.
- Imputation of Race and Ethnicity in Health Insurance Marketplace Enrollment Data, 2015–2022 Open Enrollment Periods. RAND Corporation, Santa Monica, CA.
- U.S. Census Bureau (2021). SEX BY AGE. https://data.census.gov/table/ACSDT5YSPT2021.B01001?t=-00.
- Voicu, I. (2018). Using first name information to improve race and ethnicity classification. Statistics and Public Policy 5, 1–13.
- Statistical identifiability and the surrogate endpoint problem, with application to vaccine trials. Biometrics 66, 1153–1161.
- Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society. Series B, Statistical methodology 81, 735–761.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.