Multiple imputation of covariates by fully conditional specification: accommodating the substantive model (1210.6799v3)

Published 25 Oct 2012 in stat.ME

Abstract: Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation (MI). Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of MI may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing MI, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it to existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible.

Citations (392)

View on Semantic Scholar

Summary

The paper introduces SMC-FCS, a novel imputation approach ensuring compatibility with non-linear substantive models to yield unbiased estimates.
The paper's methodology adapts fully conditional specification to incorporate complex interactions and non-linear effects, reducing mis-specification bias.
Simulation studies and an Alzheimer’s data application demonstrate enhanced parameter accuracy and improved confidence interval coverage compared to standard methods.

Fully Conditional Specification in Multiple Imputation of Covariates for Non-linear Models

The paper presented by Bartlett, Seaman, White, and Carpenter addresses a crucial methodological challenge in handling missing covariate data within epidemiological and clinical research. Specifically, the research focuses on enhancing the fidelity of multiple imputation (MI) under non-linear substantive models, such as the Cox proportional hazards model, or models that incorporate non-linear (squared) and interaction terms.

Background and Motivation

Conventional MI techniques frequently adopt joint model or fully conditional specification (FCS) approaches without sufficiently accounting for the compatibility between the imputation and substantive models. When the substantive model includes intricate relationships like non-linear effects or interactions, as is sometimes the case in medical research, default imputation models can lead to biased estimates. The paper illustrates that standard imputation approaches in popular software may not account for these complexities, paving the way for the development of a more adaptable, compatible method.

Methods and Simulation Studies

The researchers introduce a refined method within the popular FCS framework, aptly named substantive model compatible FCS (SMC-FCS). This newly proposed method ensures that imputation models for partially observed covariates align seamlessly with the assumptions embedded in the substantive model. The novel approach subtracts the mis-specification bias by employing imputation models that are either compatible or semi-compatible with the considered substantive model.

Simulations under various complex covariate and outcome scenarios, including linear regression with quadratic effects and Cox proportional hazards models, demonstrate the superiority of SMC-FCS over conventional FCS and other approaches like the 'just another variable' (JAV). Notably, the simulations exhibit that SMC-FCS yields unbiased estimates even when dealing with intricate covariance structures, provided the substantive model is correctly specified. In settings like logistic regression or when data are missing at random (MAR), it frequently outperformed standard FCS in maintaining both the accuracy of parameter estimates and coverage of confidence intervals.

Application to Alzheimer’s Disease Data

In exploring real-world applicability, the authors apply SMC-FCS to data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The substantive model targeted the time to conversion from mild cognitive impairment to Alzheimer's disease, incorporating covariates known to influence this transition. By accounting for non-linear associations, the SMC-FCS approach preserved important relationships, such as the quadratic effect of the amyloid β 1-42 peptide, which were diluted using traditional MI methods.

Implications and Future Directions

The implications of this work are profound for statistical practice in medical research, where the accurate imputation of missing data holds significant sway over inferential validity. While the approach shows promise, it is inherently more computationally demanding due to the requisite resampling techniques. Further research could focus on streamlining these computational processes or integrating SMC-FCS into mainstream statistical software.

Furthermore, the authors address scenarios where multiple potential models may be of interest, suggesting the use of nested substantive models within the SMC-FCS framework. This adaptive, holistic view on imputation modeling challenges the rigidity of one-size-fits-all imputation practices and opens avenues for more nuanced applications across diverse domains.

Conclusion

In sum, this paper provides a methodologically rigorous alternative to existing imputation techniques, particularly in contexts demanding compatibility with non-linear substantive models. This contribution equips researchers with a robust tool to address the persistent problem of missing data, aspiring to improve the accuracy and reliability of epidemiological and clinical analyses, with applications extending beyond biomedical contexts.

PDF Markdown