A cautious approach to constraint-based causal model selection (2404.18232v1)
Abstract: We study the data-driven selection of causal graphical models using constraint-based algorithms, which determine the existence or non-existence of edges (causal connections) in a graph based on testing a series of conditional independence hypotheses. In settings where the ultimate scientific goal is to use the selected graph to inform estimation of some causal effect of interest (e.g., by selecting a valid and sufficient set of adjustment variables), we argue that a "cautious" approach to graph selection should control the probability of falsely removing edges and prefer dense, rather than sparse, graphs. We propose a simple inversion of the usual conditional independence testing procedure: to remove an edge, test the null hypothesis of conditional association greater than some user-specified threshold, rather than the null of independence. This equivalence testing formulation to testing independence constraints leads to a procedure with desriable statistical properties and behaviors that better match the inferential goals of certain scientific studies, for example observational epidemiological studies that aim to estimate causal effects in the face of causal model uncertainty. We illustrate our approach on a data example from environmental epidemiology.
- T. W. Anderson. An introduction to multivariate statistical analysis. John Wiley & Sons, 3rd edition, 2003.
- A practical guide to causal discovery with cohort data. arXiv preprint arXiv:2108.13395, 2021.
- A. Ankan and J. Textor. A simple unified approach to testing high-dimensional conditional independences for categorical and ordinal data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12180–12188, 2023.
- Causal network models of SARS-CoV-2 expression and aging to identify candidates for drug repurposing. Nature Communications, 12(1):1024, 2021.
- Semiparametric inference for causal effects in graphical models with hidden variables. Journal of Machine Learning Research, 23(295):1–76, 2022.
- A. Bilinski and L. A. Hatfield. Nothing to see here? non-inferiority approaches to parallel trends and other model assumptions. arXiv preprint arXiv:1805.03273, 2018.
- E. Brosset and G. Ngueta. Exposure to per-and polyfluoroalkyl substances and glycemic control in older us adults with type 2 diabetes mellitus. Environmental Research, 216:114697, 2023.
- Causal discoveries for high dimensional mixed data. Statistics in Medicine, 41(24):4924–4940, 2022.
- S. Chakraborty and A. Shojaie. Nonparametric causal structure learning in high dimensions. Entropy, 24(3):351, 2022.
- An individualized causal framework for learning intercellular communication networks that define microenvironments of individual tumors. PLOS Computational Biology, 18(12):e1010761, 2022.
- D. M. Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554, 2002.
- D. Colombo and M. H. Maathuis. Order-independent constraint-based causal structure learning. Journal of Machine Learning Research, 15(1):3741–3782, 2014.
- Learning high-dimensional directed acyclic graphs with latent and selection variables. Annals of Statistics, 40(1):294–321, 2012.
- M. Drton and M. D. Perlman. Multiple testing and error control in Gaussian graphical model selection. Statistical Science, pages 430–449, 2007.
- M. Drton and M. D. Perlman. A SINful approach to Gaussian graphical model selection. Journal of Statistical Planning and Inference, 138(4):1179–1200, 2008.
- An automated approach to causal inference in discrete settings. Journal of the American Statistical Association, pages 1–16, 2023.
- Causal mapping of emotion networks in the human brain: Framework and initial findings. Neuropsychologia, 145:106571, 2020.
- Causal modeling of cancer-stromal communication identifies pappa as a novel stroma-secreted factor activating nfκ𝜅\kappaitalic_κb signaling in hepatocellular carcinoma. PLoS Computational Biology, 11(5):e1004293, 2015.
- Nonlinear directed acyclic structure learning with weakly additive noise models. In Advances in Neural Information Processing Systems, 2009.
- Variable elimination, graph reduction and the efficient g-formula. Biometrika, 110(3):739–761, 2023.
- N. Harris and M. Drton. PC algorithm for nonparanormal graphical models. Journal of Machine Learning Research, 14(11):3365–3383, 2013.
- Causal inference: What If. CRC Press, 2020.
- Discovering cyclic causal models with latent variables: a general sat-based procedure. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 301–310, 2013.
- M. Kalisch and P. Bühlmann. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8(Mar):613–636, 2007.
- S. L. Lauritzen. Graphical models. Clarendon Press, 1996.
- Testing statistical hypotheses (Third edition). Springer, 2009.
- Stability approach to regularization selection (StARS) for high dimensional graphical models. In Advances in Neural Information Processing Systems, 2010.
- M. H. Maathuis and D. Colombo. A generalized back-door criterion. Annals of Statistics, 43(3):1060–1088, 2015.
- Estimating high-dimensional intervention effects from observational data. Annals of Statistics, 37(6A):3133–3164, 2009.
- Predicting causal effects in large-scale systems from observational data. Nature Methods, 7(4):247–248, 2010.
- D. Malinsky and P. Spirtes. Estimating bounds on causal effects in high-dimensional and possibly confounded systems. International Journal of Approximate Reasoning, 88:371–384, 2017.
- A potential outcomes calculus for identifying conditional path-specific effects. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 3080–3088, 2019.
- A theoretical study of y structures for causal discovery. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pages 314–323, 2006.
- C. Meek. Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 403–410, 1995.
- N. Meinshausen and P. Bühlmann. Stability selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(4):417–473, 2010.
- Joint causal inference from multiple contexts. Journal of Machine Learning Research, 21(99):1–108, 2020.
- High-dimensional consistency in score-based and hybrid structure learning. Annals of Statistics, 46(6A):3151–3183, 2018.
- J. Pearl. Causality. Cambridge University Press, 2009.
- Complete graphical characterization and construction of adjustment sets in markov equivalence classes of ancestral graphs. Journal of Machine Learning Research, 18(220):1–62, 2018.
- J. Peters and P. Bühlmann. Structural intervention distance for evaluating causal graphs. Neural Computation, 27(3):771–799, 2015.
- Elements of causal inference: foundations and learning algorithms. MIT Press, 2017.
- Data-driven model building for life-course epidemiology. American Journal of Epidemiology, 190(9):1898–1907, 2021.
- Comparison of strategies for scalable causal discovery of latent variable models from mixed data. International Journal of Data Science and Analytics, 6:33–45, 2018.
- Adjacency-faithfulness and conservative causal inference. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pages 401–408, 2006.
- An integrated multimodal model of alcohol use disorder generated by data-driven causal discovery analysis. Communications Biology, 4(1):435, 2021.
- A scale-invariant sorting criterion to find a causal order in additive noise models. In Advances in Neural Information Processing Systems, 2024.
- T. Richardson and P. Spirtes. Ancestral graph Markov models. Annals of Statistics, 30(4):962–1030, 2002.
- Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. Center for the Statistics and the Social Sciences, University of Washington Series. Working Paper, 128(30), 2013.
- J. M. Robins. A new approach to causal inference in mortality studies with a sustained exposure period – application to control of the healthy worker survivor effect. Mathematical Modelling, 7:1393–1512, 1986.
- Alternative graphical causal models and the identification of direct effects. In Causality and psychopathology: Finding the determinants of disorders and their cures, pages 103–158. Oxford University Press, 2010.
- Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical models in epidemiology, the environment, and clinical trials, pages 1–94. Springer, 2000.
- J. P. Romano. Optimal testing of equivalence hypotheses. Annals of Statistics, 33(3):1036–1047, 2005.
- A. Rotnitzky and E. Smucler. Efficient adjustment sets for population average causal treatment effect estimation in graphical models. Journal of Machine Learning Research, 21(188):1–86, 2020.
- Estimating feedforward and feedback effective connections from fMRI time series: Assessments of statistical methods. Network Neuroscience, 3(2):274–306, 2019.
- R. D. Shah and J. Peters. The hardness of conditional independence testing and the generalised covariance measure. Annals of Statistics, 48(3):1514, 2020.
- I. Shpitser. Identification in graphical causal models. In Handbook of Graphical Models, pages 381–404. CRC Press, 2018.
- On the validity of covariate adjustment for estimating causal effects. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, pages 527–536, 2010.
- Efficient adjustment sets in causal graphical models with hidden variables. Biometrika, 109(1):49–65, 2022.
- Consistency guarantees for greedy permutation-based causal inference algorithms. Biometrika, 108(4):795–814, 2021.
- A. Sondhi and A. Shojaie. The reduced PC-algorithm: improved causal structure learning in large random networks. Journal of Machine Learning Research, 20(164):1–31, 2019.
- P. Spirtes and J. Zhang. A uniformly consistent estimator of causal effects under the k-triangle-faithfulness assumption. Statistical Science, pages 662–678, 2014.
- Causation, prediction, and search. MIT Press, 2000.
- Causal stability ranking. Bioinformatics, 28(21):2819–2823, 2012.
- Estimating and controlling the false discovery rate of the PC algorithm using edge-specific p-values. ACM Transactions on Intelligent Systems and Technology, 10(5):1–37, 2019.
- Z. Tan. On doubly robust estimation for logistic partially linear models. Statistics & Probability Letters, 155:108577, 2019.
- On doubly robust estimation in a semiparametric odds ratio model. Biometrika, 97(1):171–180, 2010.
- Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations. International Journal of Epidemiology, 50(2):620–632, 2021.
- Geometry of the faithfulness assumption in causal inference. Annals of Statistics, pages 436–463, 2013.
- Separators and adjustment sets in causal graphs: Complete criteria and an algorithmic framework. Artificial Intelligence, 270:1–40, 2019.
- T. Verma and J. Pearl. Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, pages 255–270, 1990.
- Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data. Statistics in Medicine, 41(23):4716–4743, 2022.
- Y. Xiang and N. Simon. A flexible framework for nonparametric graphical modeling that accommodates machine learning. In International Conference on Machine Learning, pages 10442–10451, 2020.
- SAT-based causal discovery under weaker assumptions. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, 2017.
- J. Zhang. Causal reasoning with ancestral graphs. Journal of Machine Learning Research, 9:1437–1474, 2008a.
- J. Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172(16):1873–1896, 2008b.
- J. Zhang and E. Bareinboim. Non-parametric methods for partial identification of causal effects. In Proceedings of the 37th International Conference on Machine Learning, 2021.
- J. Zhang and P. Spirtes. The three faces of faithfulness. Synthese, 193:1011–1027, 2016.