Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Assessing the Impact of Case Correction Methods on the Fairness of COVID-19 Predictive Models (2405.10355v1)

Published 16 May 2024 in physics.soc-ph and cs.CY

Abstract: One of the central difficulties of addressing the COVID-19 pandemic has been accurately measuring and predicting the spread of infections. In particular, official COVID-19 case counts in the United States are under counts of actual caseloads due to the absence of universal testing policies. Researchers have proposed a variety of methods for recovering true caseloads, often through the estimation of statistical models on more reliable measures, such as death and hospitalization counts, positivity rates, and demographics. However, given the disproportionate impact of COVID-19 on marginalized racial, ethnic, and socioeconomic groups, it is important to consider potential unintended effects of case correction methods on these groups. Thus, we investigate two of these correction methods for their impact on a downstream COVID-19 case prediction task. For that purpose, we tailor an auditing approach and evaluation protocol to analyze the fairness of the COVID-19 prediction task by measuring the difference in model performance between majority-White counties and majority-minority counties. We find that one of the correction methods improves fairness, decreasing differences in performance between majority-White and majority-minority counties, while the other method increases differences, introducing bias. While these results are mixed, it is evident that correction methods have the potential to exacerbate existing biases in COVID-19 case data and in downstream prediction tasks. Researchers planning to develop or use case correction methods must be careful to consider negative effects on marginalized groups.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Fair regression: Quantitative definitions and reduction-based algorithms. In International Conference on Machine Learning, 120–129. PMLR.
  2. COVID-19 underreporting and its impact on vaccination strategies. BMC Infectious Diseases, 21: 1–13.
  3. Estimation of US SARS-CoV-2 infections, symptomatic infections, hospitalizations, and deaths using seroprevalence surveys. JAMA network open, 4(1): e2033706–e2033706.
  4. Interpretable sequence learning for COVID-19 forecasting. Advances in Neural Information Processing Systems, 33: 18807–18818.
  5. The effect of COVID-19 on the economy: Evidence from an early adopter of localized lockdowns. Journal of global health, 11.
  6. Basu, A. 2020. Estimating The Infection Fatality Rate Among Symptomatic COVID-19 Cases In The United States: Study estimates the COVID-19 infection fatality rate at the US county level. Health Affairs, 39(7): 1229–1236.
  7. A convex framework for fair regression. arXiv preprint arXiv:1706.02409.
  8. Understanding the Origins of Bias in Word Embeddings.
  9. Controlling attribute effect in linear regression. In 2013 IEEE 13th international conference on data mining, 71–80. IEEE.
  10. Optimized Data Pre-Processing for Discrimination Prevention.
  11. A clarification of the nuances in the fairness metrics landscape. Scientific Reports, 12(1): 4209.
  12. The United States COVID-19 Forecast Hub dataset. Scientific Data.
  13. Fairer and More Accurate Tabular Models Through NAS.
  14. Invisibilidad de los latinos en la pandemia. AMA Journal of Ethics, 289–295.
  15. Variation in reporting of the race and ethnicity of COVID-19 cases and deaths across US states: April 12, 2020, and November 9, 2020. American Journal of Public Health, 111(6): 1141–1148.
  16. Fair regression under sample selection bias. In 2022 IEEE International Conference on Big Data (Big Data), 1435–1444. IEEE.
  17. Error Parity Fairness: Testing for Group Fairness in Regression Tasks. arXiv preprint arXiv:2208.08279.
  18. The impact of lockdown timing on COVID-19 transmission across US counties. EClinicalMedicine, 38.
  19. Correcting under-reported COVID-19 case numbers: estimating the true scale of the pandemic. medRxiv, 2020–03.
  20. Participatory approaches to addressing missing COVID-19 race and ethnicity data. International Journal of Environmental Research and Public Health, 18(12): 6559.
  21. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science, 368(6490): 489–493.
  22. Racial and ethnic disparities in COVID-19–related infections, hospitalizations, and deaths: a systematic review. Annals of internal medicine, 174(3): 362–373.
  23. Improving local prevalence estimates of SARS-CoV-2 infections using a causal debiasing framework. Nature Microbiology, 7(1): 97–107.
  24. The local burden of disease during the first wave of the COVID-19 epidemic in England: estimation using different data sources from changing surveillance practices. BMC public health, 22(1): 716.
  25. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12: 2825–2830.
  26. Rudin, C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5): 206–215.
  27. A statistical model for the dynamics of COVID-19 infections and their case detection ratio in 2020. Biometrical Journal, 63(8): 1623–1632.
  28. Racial and ethnic disparities in excess deaths during the COVID-19 pandemic, March to December 2020. Annals of internal medicine, 174(12): 1693–1699.
  29. A general approach to fairness with optimal transport. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 3633–3640.
  30. Fairness measures for regression via probabilistic classification. arXiv preprint arXiv:2001.06089.
  31. Times, T. N. Y. 2021. Coronavirus (Covid-19) Data in the United States. https://github.com/nytimes/covid-19-data. Accessed: 2023-12-15.
  32. Mitigating demographic bias of machine learning models on social media.
  33. Substantial underestimation of SARS-CoV-2 infection in the United States. Nature communications, 11(1): 4507.
  34. FORML: Learning to Reweight Data for Fairness.
  35. A causal framework for discovering and removing direct and indirect discrimination. arXiv preprint arXiv:1611.07509.
  36. A seq2seq model to forecast the COVID-19 cases, deaths and reproductive R numbers in US counties. Research Square.
  37. Fair regression for health care spending. Biometrics, 76(3): 973–982.

Summary

We haven't generated a summary for this paper yet.