- The paper demonstrates that conditional multiple imputation diagnostics significantly improve the evaluation of imputation methods over traditional marginal approaches in high-dimensional EHR data.
- The study compares MI engines—PMM, NORM, and CART—revealing that CART with stratified imputation offers superior fit under nonrandom missingness patterns.
- The findings underscore that rigorous, domain-informed diagnostics are critical for reliable inference in observational epidemiological research using EHR data.
Multiple Imputation Diagnostics in EHR-based Observational Studies: Methodological Insights and Case Study Analysis
Motivation and Study Context
Electronic health records (EHRs) are increasingly leveraged for epidemiological analyses, yet systematic missingness in EHR-derived variables threatens both internal validity and statistical efficiency in observational studies. Traditional missing data procedures—such as mean imputation or complete-case analysis—are known to be either inefficient or to introduce bias, especially when missingness correlates with the underlying data generating mechanisms. Multiple imputation (MI), particularly when implemented with advanced model specifications, offers a route to unbiased statistical inference by propagating uncertainty due to missingness. However, rigorous diagnostics for the adequacy of MI remain underutilized in real-world biomedical informatics, especially in the context of complex, high-dimensional, EHR data with potentially multilevel and non-linear relationships.
This case study interrogates MI diagnostics in a well-characterized cohort with chronic kidney disease (CKD) derived from DUHS and LCHC, focusing specifically on cardiovascular hospitalization risk conditioned on neighborhood socioeconomic status (nSES). The investigation substantively examines MI performance under distinct model classes for imputation—including parametric, PMM, and tree-based methods—with an emphasis on diagnostic rigor using both marginal and conditional graphical tools.
Data Architecture and Missingness Structure
The analytic cohort consisted of 4,433 individuals with CKD, as indexed by sustained reduction in eGFR across two large health systems. Extensive covariate data encompassing demographics, comorbidities, laboratory indices and medication usage, were extracted from linked EHRs. The study's endpoint—incident cardiovascular hospitalization—was adjudicated via diagnosis codes.
Crucially, missing data were non-trivially distributed, with up to 48% missingness for anthropometric measures, 29% for blood pressure, and 17% for lipid profiles. This pattern underscores the necessity of principled handling for both efficient parameter estimation and unbiased inference in nSES-outcome associations.
Multiple Imputation Methodology and Diagnostics
MI Specifications
Three MI engines within MICE were contrasted:
- Predictive Mean Matching (PMM) for continuous variables,
- Standard linear regression (NORM),
- Classification and Regression Trees (CART) for all variable types.
For each, chained equations were constructed, and model form/conditioning was informed by clinical domain knowledge. Sequential conditional modeling was specifically used for highly collinear lipid variables, avoiding distributional collapse induced by naive model specification.
Diagnostic Strategy
Standard marginal diagnostics—comparing distributions of observed and imputed values—were found insufficient. For instance, PMM and CART imputations for HbA1c yielded lower distributions; naive interpretation might contraindicate these models, but clinical knowledge indicated these patterns reflected missingness mostly in individuals without diabetes. Therefore, plausibility could only be judged within strata defined by diabetes (for HbA1c) or hypertension (for SBP).
Conditional diagnostic plots—partitioning on these latent classifying variables—provided critical insight. Notably, PMM and NORM yielded implausibly high HbA1c imputations among those without diabetes, and out-of-range SBP values for those without hypertension, highlighting failures in these model classes. CART demonstrated superior conditional fidelity, but still overestimated SBP in non-hypertensives until further refinement. A stratified imputation approach, separately modeling subgroups defined by comorbidity/medication status, fully resolved these discrepancies.
Empirical Results and Inferential Robustness
Demographically, lower nSES tertiles displayed higher proportions of Black participants, higher mean SBP, and increased diabetes and hypertension prevalence. Patterns of missingness and observed marginal covariate values suggested pronounced nonrandom missing data mechanisms.
Hazard ratios for CVD admission as a function of nSES (adjusted for all covariates) were highly consistent between complete case analyses and those using MI with CART:
- First vs. highest nSES tertile: HR=1.10 (95% CI: 0.99, 1.22)
- Second vs. highest nSES tertile: HR=1.07 (95% CI: 0.96, 1.18)
No significant interactions between diabetes subgroup and HbA1c were detected. Probability of 5-year event-free survival was, as anticipated, lower in those with both prevalent diabetes/hypertension, with marginal differences between high and low nSES.
The key assertion is that, in this context, the specific choice of MI model (post-diagnostics and when using CART with conditional or stratified imputation) had minimal impact on inferential and predictive quantities, though spurious choices under default implementations (unstratified PMM/NORM) could yield substantively incorrect imputations.
Practical and Theoretical Implications
This study foregrounds the necessity of combining domain-informed conditional diagnostics with flexible, nonparametric MI engines (e.g., CART) to ensure the scientific validity of EHR-based epidemiological inference. Standard marginal model diagnostics are demonstrably insufficient; rigorous assessment requires multivariate, subgroup-directed visualization and evaluation. Default parametric strategies—even as provided in widely used MI software packages—are prone to propagate clinically nonsensical imputations when relationships between variables are nonlinear or strongly determined by latent classes.
From a methodological standpoint, the results imply that:
- Nonparametric MI (CART) is robust against nonlinearity, interaction effects, and non-normality prevalent in real-world EHR data,
- MI diagnostics must be conditional, stratified, and incorporate domain-specific mechanisms of missingness and plausible value distribution,
- Sequential specification strategies are essential for sets of collinear or structurally linked variables.
For practice, the study prescribes that analysts in EHR-based research explicitly integrate MI diagnostic procedures into their workflow and avoid reliance on generic, univariate reporting.
Prospects for Future Directions
Given the current study's scope—internal to a single urban area and two health systems—the generalization of these MI diagnostic frameworks to federated or more heterogeneous EHR environments, especially those with divergent missingness patterns and multi-level structures, is warranted. As the AI and clinical data science community continues to scale analyses across distributed EHR corpora, integrating robust, flexible, and diagnostically validated MI procedures will be essential, and automated multimodal diagnostics could be a focus for methodological innovation.
Conclusion
This study provides a rigorous blueprint for the implementation and evaluation of multiple imputation diagnostics in EHR-based observational analyses. The findings reinforce the critical need for nonparametric imputation strategies (such as CART), domain-aware conditional diagnostics, and careful model specification for multivariate and subgroup-conditional distributions. The advocated workflow ensures validity and reliability in downstream epidemiological inference when using EHR data, with clear implications for future practice as machine learning and AI methods become standard in clinical research pipelines.
(2604.10706)