Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multiple Imputation Diagnostics when using Electronic Health Record Data in Observational Studies: A Case Study

Published 12 Apr 2026 in stat.ME | (2604.10706v1)

Abstract: Missing values in electronic health record (EHR) data pose a significant challenge for epidemiologic research. Traditional methods for handling missing data, like mean imputation, may introduce bias. Multiple imputation (MI) offers a principled solution by generating multiple plausible values based on statistical models. However, MI requires careful model specification and validation of imputations, ideally using multivariate graphical tools. We demonstrate the application of such tools to validate MI in a study of chronic kidney disease, assessing cardiovascular outcomes linked to neighborhood socioeconomic status (nSES). This study used data from Duke University Health System (DUHS) and Lincoln Community Health Center (LCHC). Eligible patients had at least one encounter within DUHS or LCHC and had two estimated glomerular filtration rate (eGFR) values <60 mL/min per 1.73 m2 more than 90 days apart between January 1, 2007 and July 1, 2008. Socioeconomic status was assessed using the Agency for Healthcare Research and Quality (AHRQ) index based on census data. The main outcome was a cardiovascular disease-related hospitalization. Participants were mostly older (mean age 73 years), female (64%), and Black (43%). Participants living in lower nSES neighborhoods had higher mean systolic blood pressure (SBP: 140 mmHg) and hemoglobin A1c (HbA1c) levels (7.1%) as compared to participants living in higher nSES neighborhoods. A machine learning based approach, Classification and Regression Trees (CART), was the preferred approach to impute missing data. The distributions of imputed values of SBP and HbA1c were impacted by whether marginal or conditional values of SBP and HbA1c were imputed. The choice of MI had minimal impact on inference and prediction. Future research may want to extend our results and consider how results may differ when using EHR data from multiple health systems.

Summary

  • The paper demonstrates that conditional multiple imputation diagnostics significantly improve the evaluation of imputation methods over traditional marginal approaches in high-dimensional EHR data.
  • The study compares MI engines—PMM, NORM, and CART—revealing that CART with stratified imputation offers superior fit under nonrandom missingness patterns.
  • The findings underscore that rigorous, domain-informed diagnostics are critical for reliable inference in observational epidemiological research using EHR data.

Multiple Imputation Diagnostics in EHR-based Observational Studies: Methodological Insights and Case Study Analysis

Motivation and Study Context

Electronic health records (EHRs) are increasingly leveraged for epidemiological analyses, yet systematic missingness in EHR-derived variables threatens both internal validity and statistical efficiency in observational studies. Traditional missing data procedures—such as mean imputation or complete-case analysis—are known to be either inefficient or to introduce bias, especially when missingness correlates with the underlying data generating mechanisms. Multiple imputation (MI), particularly when implemented with advanced model specifications, offers a route to unbiased statistical inference by propagating uncertainty due to missingness. However, rigorous diagnostics for the adequacy of MI remain underutilized in real-world biomedical informatics, especially in the context of complex, high-dimensional, EHR data with potentially multilevel and non-linear relationships.

This case study interrogates MI diagnostics in a well-characterized cohort with chronic kidney disease (CKD) derived from DUHS and LCHC, focusing specifically on cardiovascular hospitalization risk conditioned on neighborhood socioeconomic status (nSES). The investigation substantively examines MI performance under distinct model classes for imputation—including parametric, PMM, and tree-based methods—with an emphasis on diagnostic rigor using both marginal and conditional graphical tools.

Data Architecture and Missingness Structure

The analytic cohort consisted of 4,433 individuals with CKD, as indexed by sustained reduction in eGFR across two large health systems. Extensive covariate data encompassing demographics, comorbidities, laboratory indices and medication usage, were extracted from linked EHRs. The study's endpoint—incident cardiovascular hospitalization—was adjudicated via diagnosis codes.

Crucially, missing data were non-trivially distributed, with up to 48% missingness for anthropometric measures, 29% for blood pressure, and 17% for lipid profiles. This pattern underscores the necessity of principled handling for both efficient parameter estimation and unbiased inference in nSES-outcome associations.

Multiple Imputation Methodology and Diagnostics

MI Specifications

Three MI engines within MICE were contrasted:

  • Predictive Mean Matching (PMM) for continuous variables,
  • Standard linear regression (NORM),
  • Classification and Regression Trees (CART) for all variable types.

For each, chained equations were constructed, and model form/conditioning was informed by clinical domain knowledge. Sequential conditional modeling was specifically used for highly collinear lipid variables, avoiding distributional collapse induced by naive model specification.

Diagnostic Strategy

Standard marginal diagnostics—comparing distributions of observed and imputed values—were found insufficient. For instance, PMM and CART imputations for HbA1c yielded lower distributions; naive interpretation might contraindicate these models, but clinical knowledge indicated these patterns reflected missingness mostly in individuals without diabetes. Therefore, plausibility could only be judged within strata defined by diabetes (for HbA1c) or hypertension (for SBP).

Conditional diagnostic plots—partitioning on these latent classifying variables—provided critical insight. Notably, PMM and NORM yielded implausibly high HbA1c imputations among those without diabetes, and out-of-range SBP values for those without hypertension, highlighting failures in these model classes. CART demonstrated superior conditional fidelity, but still overestimated SBP in non-hypertensives until further refinement. A stratified imputation approach, separately modeling subgroups defined by comorbidity/medication status, fully resolved these discrepancies.

Empirical Results and Inferential Robustness

Demographically, lower nSES tertiles displayed higher proportions of Black participants, higher mean SBP, and increased diabetes and hypertension prevalence. Patterns of missingness and observed marginal covariate values suggested pronounced nonrandom missing data mechanisms.

Hazard ratios for CVD admission as a function of nSES (adjusted for all covariates) were highly consistent between complete case analyses and those using MI with CART:

  • First vs. highest nSES tertile: HR=1.10 (95% CI: 0.99, 1.22)
  • Second vs. highest nSES tertile: HR=1.07 (95% CI: 0.96, 1.18)

No significant interactions between diabetes subgroup and HbA1c were detected. Probability of 5-year event-free survival was, as anticipated, lower in those with both prevalent diabetes/hypertension, with marginal differences between high and low nSES.

The key assertion is that, in this context, the specific choice of MI model (post-diagnostics and when using CART with conditional or stratified imputation) had minimal impact on inferential and predictive quantities, though spurious choices under default implementations (unstratified PMM/NORM) could yield substantively incorrect imputations.

Practical and Theoretical Implications

This study foregrounds the necessity of combining domain-informed conditional diagnostics with flexible, nonparametric MI engines (e.g., CART) to ensure the scientific validity of EHR-based epidemiological inference. Standard marginal model diagnostics are demonstrably insufficient; rigorous assessment requires multivariate, subgroup-directed visualization and evaluation. Default parametric strategies—even as provided in widely used MI software packages—are prone to propagate clinically nonsensical imputations when relationships between variables are nonlinear or strongly determined by latent classes.

From a methodological standpoint, the results imply that:

  • Nonparametric MI (CART) is robust against nonlinearity, interaction effects, and non-normality prevalent in real-world EHR data,
  • MI diagnostics must be conditional, stratified, and incorporate domain-specific mechanisms of missingness and plausible value distribution,
  • Sequential specification strategies are essential for sets of collinear or structurally linked variables.

For practice, the study prescribes that analysts in EHR-based research explicitly integrate MI diagnostic procedures into their workflow and avoid reliance on generic, univariate reporting.

Prospects for Future Directions

Given the current study's scope—internal to a single urban area and two health systems—the generalization of these MI diagnostic frameworks to federated or more heterogeneous EHR environments, especially those with divergent missingness patterns and multi-level structures, is warranted. As the AI and clinical data science community continues to scale analyses across distributed EHR corpora, integrating robust, flexible, and diagnostically validated MI procedures will be essential, and automated multimodal diagnostics could be a focus for methodological innovation.

Conclusion

This study provides a rigorous blueprint for the implementation and evaluation of multiple imputation diagnostics in EHR-based observational analyses. The findings reinforce the critical need for nonparametric imputation strategies (such as CART), domain-aware conditional diagnostics, and careful model specification for multivariate and subgroup-conditional distributions. The advocated workflow ensures validity and reliability in downstream epidemiological inference when using EHR data, with clear implications for future practice as machine learning and AI methods become standard in clinical research pipelines.

(2604.10706)

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.