Access to care improves EHR reliability and clinical risk prediction model performance

Published 10 Dec 2024 in cs.CY | (2412.07712v2)

Abstract: Disparities in access to healthcare have been well-documented in the United States, but their effects on electronic health record (EHR) data reliability and resulting clinical models are poorly understood. Using an All of Us dataset of 134,513 participants, we investigate the effects of access to care on the medical machine learning pipeline, including medical condition rates, data quality, outcome label accuracy, and prediction performance. Our findings reveal that patients with cost constrained or delayed care have worse EHR reliability as measured by patient self-reported conditions for 78% of examined medical conditions. We demonstrate in a prediction task of Type II diabetes incidence that clinical risk predictive performance can be worse for patients without standard care, with balanced accuracy gaps of 3.6 and sensitivity gaps of 9.4 percentage points for those with cost-constrained or delayed care. We evaluate solutions to mitigate these disparities and find that including patient self-reported conditions improved performance for patients with lower access to care, with 11.2 percentage points higher sensitivity, effectively decreasing the performance gap between standard versus delayed or cost-constrained care. These findings provide the first large-scale evidence that healthcare access systematically affects both data reliability and clinical prediction performance. By revealing how access barriers propagate through the medical machine learning pipeline, our work suggests that improving model equity requires addressing both data collection biases and algorithmic limitations. More broadly, this analysis provides an empirical foundation for developing clinical prediction systems that work effectively for all patients, regardless of their access to care.

Abstract PDF HTML Upgrade to Chat

Summary

The paper demonstrates that limited healthcare access leads to decreased EHR reliability, with 78% discrepancies noted in low-access patient data.
The paper shows that Type II diabetes prediction models perform worse for cost-constrained and delayed care groups, with sensitivity dropping to as low as 57%.
Incorporating self-reported conditions into prediction models improves sensitivity by 11.2 percentage points, reducing performance gaps with standard care groups.

Implications of Healthcare Access on EHR Reliability and Clinical Risk Prediction

This paper investigates the implications of healthcare access on the reliability of electronic health records (EHRs) and the performance of clinical risk prediction models, utilizing data from the National Institute of Health's All of Us program. The study aims to provide comprehensive insights into the propagation of healthcare access disparities through the medical machine learning pipeline and how they affect clinical data reliability and predictive accuracy.

The research uses a diverse cohort of 134,513 participants from the All of Us dataset, which combines self-reported healthcare access data with EHR information. Participants are categorized into three access groups: standard care, cost-constrained care, and delayed care. The study focuses on examining the effects of healthcare access on the predictive accuracy of Type II diabetes incidence models, as this condition presents widespread challenges and opportunities for early intervention.

Key Findings and Quantitative Results

The study uncovers several significant outcomes:

EHR Data Reliability: Patients with low access to care exhibit poorer EHR reliability. Specifically, there are significant discrepancies between self-reported conditions and documented EHR data for 78% of the conditions examined. The study shows statistically higher rates of missing EHR diagnoses for 29 out of 37 conditions among participants with cost-constrained or delayed access, suggesting systematic issues in representing these patients' medical histories.
Predictive Performance: The incidence prediction model for Type II diabetes reveals lower balanced accuracy and sensitivity for the cost-constrained and delayed care groups compared to the standard care group. Notably, the predictive model only identified 57.0% of new diabetes cases in the delayed care group, compared to 66.4% in the standard care group.
Mitigation Strategies: Including self-reported conditions in prediction models significantly improved sensitivity and balanced accuracy for individuals with low healthcare access. For example, sensitivity improved by 11.2 percentage points for the cost-constrained group, narrowing the performance gap with the standard care group.

Theoretical and Practical Implications

The study provides empirical evidence of the need to incorporate considerations of healthcare access into the development and deployment of clinical prediction models. It underlines the potential for systematic biases in the feature data and outcome labels that detrimentally affect predictive performance for underserved populations. The research advocates for enriched datasets that include patient self-reported conditions, which offer additional insights not captured by traditional EHR data.

From a theoretical perspective, the research contributes to the growing body of knowledge addressing algorithmic bias within clinical predictions. Unlike previous studies that often focus on bias linked to demographic factors such as race and gender, this paper highlights access as a critical variable.

Future Directions

The study's findings suggest avenues for future research to enhance model equity and performance. Notably, further exploration is warranted regarding methodological improvements that do not necessitate new data collection. Potential strategies include imputation techniques for missing data and leveraging proxies for healthcare access present in existing EHR datasets, such as clinical notes and social determinants of health data.

Additionally, understanding the downstream clinical effects of reduced model accuracy for those with limited access to care will be crucial in refining predictive systems and healthcare delivery strategies. Ultimately, integrating diverse data types into machine learning models may offer a comprehensive approach to addressing disparities in healthcare outcomes.

In conclusion, this research provides a foundational understanding of how healthcare access impacts EHR reliability and machine learning in healthcare, highlighting critical areas for future development and intervention to ensure equitable and effective clinical decision-making tools.

Markdown