Recursive Causal Discovery (2403.09300v1)
Abstract: Causal discovery, i.e., learning the causal graph from data, is often the first step toward the identification and estimation of causal effects, a key requirement in numerous scientific domains. Causal discovery is hampered by two main challenges: limited data results in errors in statistical testing and the computational complexity of the learning task is daunting. This paper builds upon and extends four of our prior publications (Mokhtarian et al., 2021; Akbari et al., 2021; Mokhtarian et al., 2022, 2023a). These works introduced the concept of removable variables, which are the only variables that can be removed recursively for the purpose of causal discovery. Presence and identification of removable variables allow recursive approaches for causal discovery, a promising solution that helps to address the aforementioned challenges by reducing the problem size successively. This reduction not only minimizes conditioning sets in each conditional independence (CI) test, leading to fewer errors but also significantly decreases the number of required CI tests. The worst-case performances of these methods nearly match the lower bound. In this paper, we present a unified framework for the proposed algorithms, refined with additional details and enhancements for a coherent presentation. A comprehensive literature review is also included, comparing the computational complexity of our methods with existing approaches, showcasing their state-of-the-art efficiency. Another contribution of this paper is the release of RCD, a Python package that efficiently implements these algorithms. This package is designed for practitioners and researchers interested in applying these methods in practical scenarios. The package is available at github.com/ban-epfl/rcd, with comprehensive documentation provided at rcdpackage.com.
- Efficient intervention design for causal discovery with latents. In International Conference on Machine Learning, pages 63–73. PMLR, 2020.
- Interventional causal representation learning. In International conference on machine learning, pages 372–407. PMLR, 2023.
- Recursive causal structure learning in the presence of latent variables and selection bias. Advances in Neural Information Processing Systems, 34:10119–10130, 2021.
- A characterization of Markov equivalence classes for acyclic digraphs. The Annals of Statistics, 25(2):505–541, 1997.
- Learning directed acyclic graphs with penalized neighbourhood regression. arXiv preprint arXiv:1511.08963, 2015.
- Survey and evaluation of causal discovery methods for time series. Journal of Artificial Intelligence Research, 73:767–819, 2022.
- Ordering-based causal structure learning in the presence of latent variables. In International Conference on Artificial Intelligence and Statistics, pages 4098–4108. PMLR, 2020.
- Differentiable causal discovery from interventional data. Advances in Neural Information Processing Systems, 33:21865–21877, 2020.
- CAM: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42(6):2526–2556, 2014.
- Wray Buntine. Theory refinement on bayesian networks. In Uncertainty proceedings 1991, pages 52–60. Elsevier, 1991.
- David Maxwell Chickering. Learning bayesian networks is np-complete. Learning from data: Artificial intelligence and statistics V, pages 121–130, 1996.
- David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002.
- Large-sample learning of bayesian networks is np-hard. Journal of Machine Learning Research, 5, 2004.
- Causal structure discovery for spatio-temporal data. In International Conference on Database Systems for Advanced Applications, pages 236–250. Springer, 2014.
- Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, pages 294–321, 2012.
- Order-independent constraint-based causal structure learning. J. Mach. Learn. Res., 15(1):3741–3782, 2014.
- A bayesian method for the induction of probabilistic networks from data. Machine learning, 9:309–347, 1992.
- Integrating locally learned causal structures with overlapping variables. Advances in Neural Information Processing Systems, 21, 2008.
- On causal discovery from time series data using fci. Probabilistic graphical models, pages 121–128, 2010.
- Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent. Journal of the American Statistical Association, 108(501):288–300, 2013.
- Markov blanket based feature selection: A review of past decade. In Proceedings of the world congress on engineering, volume 1, pages 321–328. Newswood Ltd, 2010.
- Learning Gaussian networks. In Uncertainty Proceedings 1994, pages 235–243. Elsevier, 1994.
- Optimal experiment design for causal discovery from fixed number of experiments. arXiv preprint arXiv:1702.08567, 2017.
- Review of causal discovery methods based on graphical models. Frontiers in genetics, 10:524, 2019.
- Causal discovery from temporal data: An overview and new perspectives. arXiv preprint arXiv:2303.10112, 2023.
- Learning functional causal models with generative neural networks. Explainable and interpretable models in computer vision and machine learning, pages 39–80, 2018.
- Penalized estimation of directed acyclic graphs from discrete data. Statistics and Computing, 29:161–176, 2019.
- Gene selection for cancer classification using support vector machines. Machine learning, 46(1-3):389–422, 2002.
- Pc algorithm for nonparanormal graphical models. Journal of Machine Learning Research, 14(11), 2013.
- Two optimal strategies for active learning of causal models from interventional data. International Journal of Approximate Reasoning, 55(4):926–939, 2014.
- Learning causal structures based on Markov equivalence class. In Algorithmic Learning Theory: 16th International Conference, ALT 2005, Singapore, October 8-11, 2005. Proceedings 16, pages 92–106. Springer, 2005.
- Learning bayesian networks: The combination of knowledge and statistical data. Machine learning, 20:197–243, 1995.
- Randomized experimental design for causal graph discovery. Advances in neural information processing systems, 27, 2014.
- Experiment selection for causal discovery. Journal of Machine Learning Research, 14:3041–3071, 2013.
- Structural agnostic modeling: Adversarial learning of causal graphs. The Journal of Machine Learning Research, 23(1):9831–9892, 2022.
- Characterization and learning of causal graphs with latent variables from soft interventions. Advances in Neural Information Processing Systems, 32, 2019.
- Exact learning of bounded tree-width bayesian networks. In Artificial Intelligence and Statistics, pages 370–378. PMLR, 2013.
- Greedy relaxations of the sparsest permutation algorithm. In Uncertainty in Artificial Intelligence, pages 1052–1062. PMLR, 2022.
- A fast pc algorithm for high dimensional causal discovery with multi-core pcs. IEEE/ACM transactions on computational biology and bioinformatics, 16(5):1483–1495, 2016.
- Scaling structural learning with no-bears to infer causal transcriptome networks. In Pacific Symposium on Biocomputing 2020, pages 391–402. World Scientific, 2019.
- Causal discovery from observational and interventional data across multiple environments. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Bayesian network induction via local neighborhoods. Advances in Neural Information Processing Systems, 12:505–511, 1999.
- Christopher Meek. Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 403–410, 1995.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- A recursive Markov boundary-based approach to causal structure learning. In The KDD’21 Workshop on Causal Discovery, pages 26–54. PMLR, 2021.
- Learning bayesian networks in the presence of structural side information. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 7814–7822, 2022.
- Novel ordering-based approaches for causal structure learning in the presence of unobserved variables. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12260–12268, 2023a.
- A unified experiment design approach for cyclic and acyclic causal models. Journal of Machine Learning Research, 24(354):1–31, 2023b.
- On causal discovery with cyclic additive noise models. Advances in neural information processing systems, 24, 2011.
- Distinguishing cause from effect using observational data: methods and benchmarks. The Journal of Machine Learning Research, 17(1):1103–1204, 2016.
- A graph autoencoder approach to causal structure learning. arXiv preprint arXiv:1911.07420, 2019.
- Advances in learning bayesian networks of bounded treewidth. Advances in neural information processing systems, 27:2285–2293, 2014.
- Dynotears: Structure learning from time-series data. In International Conference on Artificial Intelligence and Statistics, pages 1595–1605. PMLR, 2020.
- Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan kaufmann, 1988.
- Judea Pearl. Causality: Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, 19(2):3, 2000.
- Judea Pearl. Causality. Cambridge university press, 2009.
- Finding latent causes in causal networks: an efficient approach based on Markov blankets. Neural Information Processing Systems Foundation, 2008a.
- Using Markov blankets for causal structure learning. Journal of Machine Learning Research, 9(Jul):1295–1342, 2008b.
- Adrian E Raftery. Bayesian model selection in social research. Sociological methodology, pages 111–163, 1995.
- Turbocharging treewidth-bounded bayesian network structure learning. In Proceeding of AAAI-21, the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021.
- Adjacency-faithfulness and conservative causal inference. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pages 401–408, 2006.
- A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. International journal of data science and analytics, 3:121–129, 2017.
- Learning directed acyclic graph models based on sparsest permutations. Stat, 7(1):e183, 2018.
- Ancestral graph Markov models. The Annals of Statistics, 30(4):962–1030, 2002.
- Thomas S Richardson. Discovering cyclic causal structure. Carnegie Mellon [Department of Philosophy], 1996.
- Thomas S Richardson. A discovery algorithm for directed cyclic graphs. arXiv preprint arXiv:1302.3599, 2013.
- Learning bayesian networks with thousands of variables. Advances in neural information processing systems, 28, 2015.
- Learning graphical model structure using l1-regularization paths. In AAAI, volume 7, pages 1278–1283, 2007.
- Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
- Nodags-flow: Nonlinear cyclic causal structure learning. In International Conference on Artificial Intelligence and Statistics, pages 6371–6387. PMLR, 2023.
- Consistency guarantees for greedy permutation-based causal inference algorithms. Biometrika, 108(4):795–814, 2021.
- A polynomial time algorithm for determining dag equivalence in the presence of latent variables and selection bias. In Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, pages 489–500. Citeseer, 1996.
- Causal inference in the presence of latent variables and selection bias. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 499–506, 1995.
- Causation, prediction, and search. MIT press, 2000.
- A kernel-based causal learning algorithm. In Proceedings of the 24th international conference on Machine learning, pages 855–862, 2007.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
- Ordering-based search: a simple and effective algorithm for learning bayesian networks. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pages 584–590, 2005.
- Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003.
- Algorithms for large scale Markov blanket discovery. In FLAIRS conference, volume 2, pages 376–380, 2003.
- Equivalence and synthesis of causal models. UCLA, Computer Science Department, 1991.
- D’ya like dags? a survey on structure learning and causal discovery. ACM Computing Surveys, 55(4):1–36, 2022.
- Desiderata for representation learning: A causal perspective. arXiv preprint arXiv:2109.03795, 2021.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229–256, 1992.
- Speculative Markov blanket discovery for optimal feature selection. In Fifth IEEE International Conference on Data Mining (ICDM’05), pages 4–pp. IEEE, 2005.
- Mining Markov blankets without causal sufficiency. IEEE transactions on neural networks and learning systems, 29(12):6333–6347, 2018.
- Directed graphical models and causal discovery for zero-inflated data. In Conference on Causal Learning and Reasoning, pages 27–67. PMLR, 2023.
- Dag-gnn: Dag structure learning with graph neural networks. In International Conference on Machine Learning, pages 7154–7163. PMLR, 2019.
- Jiji Zhang. Causal reasoning with ancestral graphs. Journal of Machine Learning Research, 9:1437–1474, 2008a.
- Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172(16-17):1873–1896, 2008b.
- Kernel-based conditional independence test and application in causal discovery. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 804–813, 2011.
- High-dimensional functional graphical model structure learning via neighborhood selection approach. Electronic Journal of Statistics, 18(1):1042–1129, 2024.
- Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.