Recoverability of Causal Effects under Presence of Missing Data: a Longitudinal Case Study (2402.14562v3)
Abstract: Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of ``closed missingness mechanisms'' and show that under these mechanisms an available case analysis is admissible for consistent estimation for any type of statistical and causal query, even if the underlying missingness mechanism is of missing not at random (MNAR) type. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses demonstrate an innovative application of missingness DAGs to complex longitudinal real-world data, while highlighting the sensitivity of the results with respect to the assumed causal model.
- Donald B. Rubin. Inference and missing data. Biometrika, 63(3):581–592, 1976. ISSN 00063444. URL http://www.jstor.org/stable/2335739.
- Graphical models for recovering probabilistic and causal queries from missing data. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper/2014/file/31839b036f63806cba3f47b93af8ccb5-Paper.pdf.
- Graphical models for processing missing data. Journal of the American Statistical Association, 116(534):1023–1037, 2021. doi: 10.1080/01621459.2021.1874961. URL https://doi.org/10.1080/01621459.2021.1874961.
- Graphical models for inference with missing data. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper/2013/file/0ff8033cf9437c213ee13937b1c4c455-Paper.pdf.
- Canonical Causal Diagrams to Guide the Treatment of Missing Data in Epidemiologic Studies. American Journal of Epidemiology, 187(12):2705–2715, 08 2018. ISSN 0002-9262. doi: 10.1093/aje/kwy173. URL https://doi.org/10.1093/aje/kwy173.
- Abacavir, zidovudine, or stavudine as paediatric tablets for african hiv-infected children (chapas-3): an open-label, parallel-group, randomised controlled trial. The Lancet Infectious Diseases, 16(2):169–79, 2016.
- Plasma efavirenz exposure, sex, and age predict virological response in hiv-infected african children. Journal of Acquired Immune Deficiency Syndromes, 73(2):161–168, 2016. URL <GotoISI>://MEDLINE:27116047.
- Causal inference with continuous multiple time point interventions. arXiv e-prints, https://arxiv.org/abs/2305.06645, 2023.
- Missing data: our view of the state of the art. Psychological methods, 7 2:147–77, 2002.
- Jin Tian. Missing at Random in Graphical Models. In Guy Lebanon and S. V. N. Vishwanathan, editors, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, volume 38 of Proceedings of Machine Learning Research, pages 977–985, San Diego, California, USA, 09–12 May 2015. PMLR. URL https://proceedings.mlr.press/v38/tian15.html.
- What is meant by “missing at random”? Statistical Science, 28(2), may 2013. doi: 10.1214/13-sts415. URL https://doi.org/10.1214%2F13-sts415.
- Michael Schomaker. Regression and causality, 2020. URL https://arxiv.org/abs/2006.11754.
- Judea Pearl. Causality. Cambridge University Press, 2 edition, 2009. doi: 10.1017/CBO9780511803161.
- A general identification condition for causal effects. In Eighteenth National Conference on Artificial Intelligence, page 567–573, USA, 2002. American Association for Artificial Intelligence. ISBN 0262511290.
- Identification of joint interventional distributions in recursive semi-markovian causal models. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 2, AAAI’06, page 1219–1226. AAAI Press, 2006. ISBN 9781577352815.
- Identifying causal effects with the R package causaleffect. Journal of Statistical Software, 76(12):1–30, 2017. doi: 10.18637/jss.v076.i12.
- James Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9):1393–1512, 1986. ISSN 0270-0255. doi: https://doi.org/10.1016/0270-0255(86)90088-6. URL https://www.sciencedirect.com/science/article/pii/0270025586900886.
- Estimating causal effects from epidemiologic data. Journal of epidemiology and community health, 60:578–86, 08 2006. doi: 10.1136/jech.2004.029496.
- M. Hernan and J. Robins. Causal inference. Chapman & Hall/CRC, Boca Raton, 2020. URL https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.
- Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer New York, 2011. ISBN 9781441997821. URL https://books.google.de/books?id=RGnSX5aCAgQC.
- MA Hernán and JM Robins. Causal Inference: What If. Chapman & Hall/CRC, Boca Raton, 2020.
- Amelia ii: A program for missing data. Journal of Statistical Software, 45(7):1–47, 2011.
- simcausal R package: Conducting transparent and reproducible simulation studies of causal effect estimation with complex longitudinal data. Journal of Statistical Software, 81(2):1–47, 2017. doi: 10.18637/jss.v081.i02.
- Simultaneous treatment of missing data and measurement error in hiv research using multiple overimputation. Epidemiology, 26(5):628–636, 2015.