Learning Large Causal Structures from Inverse Covariance Matrix via Sparse Matrix Decomposition (2211.14221v3)
Abstract: Learning causal structures from observational data is a fundamental problem facing important computational challenges when the number of variables is large. In the context of linear structural equation models (SEMs), this paper focuses on learning causal structures from the inverse covariance matrix. The proposed method, called ICID for Independence-preserving Decomposition from Inverse Covariance matrix, is based on continuous optimization of a matrix decomposition model that preserves the nonzero patterns of the inverse covariance matrix. Through theoretical and empirical evidences, we show that ICID efficiently identifies the sought directed acyclic graph (DAG) assuming the knowledge of noise variances. Moreover, ICID is shown empirically to be robust under bounded misspecification of noise variances in the case where the noise variances are non-equal. The proposed method enjoys a low complexity, as reflected by its time efficiency in the experiments, and also enables a novel regularization scheme that yields highly accurate solutions on the Simulated fMRI data (Smith et al., 2011) in comparison with state-of-the-art algorithms.
- Globally optimal score-based learning of directed acyclic graphs in high-dimensions. Advances in Neural Information Processing Systems, 32, 2019.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
- A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
- D. P. Bertsekas. Nonlinear programming, volume 2nd Editio. 1999. ISBN 1886529000. URL http://www.citeulike.org/group/4340/article/1859441.
- Dynamics of blood flow and oxygenation changes during brain activation: the balloon model. Magnetic resonance in medicine, 39(6):855–864, 1998.
- The Dantzig selector: Statistical estimation when p𝑝pitalic_p is much larger than n𝑛nitalic_n. The annals of Statistics, 35(6):2313–2351, 2007.
- On causal discovery with an equal-variance assumption. Biometrika, 106(4):973–980, 2019.
- David Maxwell Chickering. Learning bayesian networks is NP-complete. Learning from data: Artificial intelligence and statistics V, pages 121–130, 1996.
- David Maxwell Chickering. Learning equivalence classes of bayesian-network structures. The Journal of Machine Learning Research, 2:445–498, 2002a.
- David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002b.
- From graphs to DAGs: a low-complexity model and a scalable algorithm. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 107–122. Springer, 2022.
- Sparse inverse covariance estimation with the graphical lasso. 2007. URL http://statweb.stanford.edu/{~}tibs/ftp/graph.pdf.
- Dynamic causal modelling. Neuroimage, 19(4):1273–1302, 2003.
- Incidence matrices and interval graphs. Pacific journal of mathematics, 15(3):835–855, 1965.
- Optimal estimation of gaussian dag models. In International Conference on Artificial Intelligence and Statistics, pages 8738–8757. PMLR, 2022.
- Characterizing distribution equivalence and structure learning for cyclic and acyclic directed graphs. In International Conference on Machine Learning, pages 3494–3504. PMLR, 2020.
- Learning linear structural equation models in polynomial time and sample complexity. In International Conference on Artificial Intelligence and Statistics, pages 1466–1475. PMLR, 2018.
- Cause Effect Pairs in Machine Learning. Springer, 2019. ISBN 978-3-030-21809-6. doi: 10.1007/978-3-030-21810-2. URL https://doi.org/10.1007/978-3-030-21810-2.
- Pairwise likelihood ratios for estimation of non-gaussian structural equation models. The Journal of Machine Learning Research, 14(1):111–152, 2013.
- Probabilistic graphical models: principles and techniques. MIT press, 2009.
- High-dimensional learning of linear causal networks via inverse covariance estimation. The Journal of Machine Learning Research, 15(1):3065–3105, 2014.
- Large-scale differentiable causal discovery of factor graphs. Advances in Neural Information Processing Systems, 35:19290–19303, 2022.
- Christopher Meek. Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, page 403–410. Morgan Kaufmann Publishers Inc., 1995. ISBN 1558603859.
- On the role of sparsity and DAG constraints for learning linear DAGs. Advances in Neural Information Processing Systems, 33:17943–17954, 2020.
- Reliable causal discovery with improved exact search and weaker assumptions. Advances in Neural Information Processing Systems, 34:20308–20320, 2021.
- Schur products and matrix completions. Journal of functional analysis, 85(1):151–178, 1989.
- Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2000.
- Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society. Series B (Statistical Methodology), pages 947–1012, 2016.
- Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
- A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. International journal of data science and analytics, 3:121–129, 2017.
- Learning directed acyclic graph models based on sparsest permutations. Stat, 7(1):e183, 2018.
- Beware of the simulated DAG! Causal discovery benchmarks may be easy to game. Advances in Neural Information Processing Systems, 34:27772–27784, 2021.
- Thomas Richardson. A polynomial-time algorithm for deciding markov equivalence of directed cyclic graphical models. In Proceedings of the Twelfth International Conference on Uncertainty in Artificial Intelligence, UAI’96, page 462–469, San Francisco, CA, USA, 1996. Morgan Kaufmann Publishers Inc. ISBN 155860412X.
- Donald J Rose. Triangulated graphs and the elimination process. Journal of Mathematical Analysis and Applications, 32(3):597–609, 1970.
- Counterfactual generative networks. In International Conference on Learning Representations (ICLR), 2021.
- A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(10), 2006.
- DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. The Journal of Machine Learning Research, 12:1225–1248, 2011.
- Network modelling methods for fmri. Neuroimage, 54(2):875–891, 2011.
- Consistency guarantees for greedy permutation-based causal inference algorithms. Biometrika, 108(4):795–814, 2021.
- Causation, prediction, and search. MIT press, 2000.
- ICML workshop on algorithmic recourse. 2021.
- Chordal graphs and semidefinite optimization. Foundations and Trends® in Optimization, 1(4):241–433, 2015. ISSN 2167-3888. doi: 10.1561/2400000006. URL http://dx.doi.org/10.1561/2400000006.
- Gherardo Varando. Learning dags without imposing acyclicity. arXiv preprint arXiv:2006.03005, 2020.
- DAGs with NO TEARS: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems, volume 31, 2018. URL https://proceedings.neurips.cc/paper/2018/file/e347c51419ffb23ca3fd5050202f9c3d-Paper.pdf.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.