Reliability of statistical inference after tree-based imputation
Determine whether statistical inference—specifically the validity of hypothesis testing and parameter estimation—is reliable when missing values are imputed using tree-based multiple imputation methods in empirical social science datasets. The purpose is to ascertain if standard inferential procedures yield trustworthy Type I error control and power when the imputation models are tree-based rather than parametric.
References
Despite, e.g., missRanger's growing use in empirical studies, a critical question remains unanswered: is statistical inference reliable for data imputed using tree-based methods? This is important because a predecessor of missRanger, the original missForest which does not allow for predictive mean matching, has led to inflated Type I errors for specific designs in previous research.