Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalized Criterion for Identifiability of Additive Noise Models Using Majorization (2404.05148v1)

Published 8 Apr 2024 in stat.ME and stat.ML

Abstract: The discovery of causal relationships from observational data is very challenging. Many recent approaches rely on complexity or uncertainty concepts to impose constraints on probability distributions, aiming to identify specific classes of directed acyclic graph (DAG) models. In this paper, we introduce a novel identifiability criterion for DAGs that places constraints on the conditional variances of additive noise models. We demonstrate that this criterion extends and generalizes existing identifiability criteria in the literature that employ (conditional) variances as measures of uncertainty in (conditional) distributions. For linear Structural Equation Models, we present a new algorithm that leverages the concept of weak majorization applied to the diagonal elements of the Cholesky factor of the covariance matrix to learn a topological ordering of variables. Through extensive simulations and the analysis of bank connectivity data, we provide evidence of the effectiveness of our approach in successfully recovering DAGs. The code for reproducing the results in this paper is available in Supplementary Materials.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. A high-dimensional approach to measure connectivity in the financial sector. The Institute of Mathematical Statistics, 2023.
  2. Bertsekas, D. P. Convex Optimization Algorithms. Athena Scientific, 2015.
  3. Sparse estimation of a covariance matrix. Biometrika, 98(4):807–820, 2011. ISSN 00063444, 14643510.
  4. Econometric measures of connectedness and systemic risk in the finance and insurance sectors. Journal of Financial Economics, 104(3):535–559, 2012. ISSN 0304-405X. doi: https://doi.org/10.1016/j.jfineco.2011.12.010. Market Institutions, Financial Market Risks and Financial Crisis.
  5. Bollen, K. Structural Equations with Latent Variables. John Wiley and Sons, New York, 1989.
  6. Breiman, L. Random forests. Machine Learning, 45(1):5–32, 2001. doi: 10.1023/A:1010933404324.
  7. Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electronic Journal of Statistics, 10:1–59, 2016.
  8. On causal discovery with an equal-variance assumption. Biometrika, 106(4):973–980, 09 2019. ISSN 0006-3444.
  9. Chickering, D. M. Optimal structure identification with greedy search. J. Mach. Learn. Res., 3(null):507–554, 2003. ISSN 1532-4435.
  10. Row modifications of a sparse cholesky factorization. SIAM Journal on Matrix Analysis and Applications, 26(3):621–639, 2005.
  11. Estimating global bank network connectedness. Journal of Applied Econometrics, 33(1):1–15, 2018.
  12. Learning identifiable gaussian bayesian networks in polynomial time and sample complexity. In Advances in Neural Information Processing Systems, volume 30, 2017.
  13. Learning linear structural equation models in polynomial time and sample complexity. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pp.  1466–1475, 2018.
  14. Review of causal discovery methods based on graphical models. Frontiers in Genetics, 10, 2019. ISSN 1664-8021. doi: 10.3389/fgene.2019.00524. URL https://www.frontiersin.org/articles/10.3389/fgene.2019.00524.
  15. Matrix Computations (3rd Ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996. ISBN 0-8018-5414-8.
  16. Inequalities. Cambridge Mathematical Library. Cambridge University Press, 1952. ISBN 9780521358804.
  17. Matrix Analysis. Cambridge University Press, New York, NY, USA, 2nd edition, 2012.
  18. Nonlinear causal discovery with additive noise models. In Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS’08, pp.  689–696, 2008. ISBN 9781605609492.
  19. Causal inference using the algorithmic markov condition. IEEE Trans. Inf. Theor., 56(10):5168–5194, oct 2010. ISSN 0018-9448. doi: 10.1109/TIT.2010.2060095.
  20. Causal discovery toolbox: Uncover causal relationships in python, 2019.
  21. A scalable sparse cholesky based approach for learning high-dimensional covariance matrices in ordered data. Machine Learning, 108(12):2061–2086, 2019. doi: 10.1007/s10994-019-05810-5.
  22. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press, 2009. ISBN 0262013193.
  23. High-dimensional learning of linear causal networks via inverse covariance estimation. Journal of Machine Learning Research, 15(88):3065–3105, 2014. URL http://jmlr.org/papers/v15/loh14a.html.
  24. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res., 16(1):559–616, 2015. ISSN 1532-4435.
  25. Symmetry, 0-1 matrices and jacobians: A review. Econometric Theory, 2(2):157–190, 1986.
  26. Inequalities: Theory of Majorization and its Applications, volume 143. Springer, second edition, 2011. doi: 10.1007/978-0-387-68276-1.
  27. Regression by dependence minimization and its application to causal inference in additive noise models. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp.  745–752, 2009. doi: 10.1145/1553374.1553470.
  28. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res., 17(1):1103–1204, jan 2016. ISSN 1532-4435.
  29. Olkin, I. Estimating a cholesky decomposition. Linear Algebra and its Applications, 67:201 – 205, 1985. ISSN 0024-3795.
  30. Park, G. Identifiability of additive noise models using conditional variances. Journal of Machine Learning Research, 21(75):1–34, 2020.
  31. Identifiability of generalized hypergeometric distribution (ghd) directed acyclic graphical models. In Chaudhuri, K. and Sugiyama, M. (eds.), Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pp.  158–166. PMLR, 16–18 Apr 2019.
  32. Learning quadratic variance function (qvf) dag models via overdispersion scoring (ods). Journal of Machine Learning Research, 18(224):1–44, 2018.
  33. Identifiability of Gaussian structural equation models with equal error variances. Biometrika, 101(1):219–228, 11 2013. ISSN 0006-3444.
  34. Identifiability of causal graphs using functional models. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI’11, pp.  589–598, Arlington, Virginia, USA, 2011. AUAI Press. ISBN 9780974903972.
  35. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15(58):2009–2053, 2014. URL http://jmlr.org/papers/v15/peters14a.html.
  36. Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, 2017. ISBN 0262037319.
  37. Best permutation analysis. J. Multivar. Anal., 121:193–223, October 2013. doi: 10.1016/j.jmva.2013.03.001.
  38. Learning directed acyclic graphs based on sparsest permutations. Stat, 7, 07 2013. doi: 10.1002/sta4.183.
  39. Beware of the simulated dag! causal discovery benchmarks may be easy to game. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  27772–27784. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/e987eff4a7c7b7e580d659feb6f60c1a-Paper.pdf.
  40. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(72):2003–2030, 2006.
  41. Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika, 97(3):519–538, 07 2010. doi: 10.1093/biomet/asq038.
  42. Simon, H. A. Causal Ordering and Identifiability, pp.  53–80. Springer Netherlands, 1977. ISBN 978-94-010-9521-1. doi: 10.1007/978-94-010-9521-1˙5.
  43. Causation, Prediction, and Search, 2nd Edition, volume 1 of MIT Press Books. The MIT Press, 2 edition, February 2001.
  44. Tao, T. Analysis II. Singapore : Springer, third edition edition, 2016.
  45. Wainwright, M. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019. ISBN 9781108498029.
  46. Wang, H. Coordinate descent algorithm for covariance graphical lasso. Statistics and Computing, 24:521–529, 2014.
  47. Thresholded graphical lasso adjusts for latent variables. Biometrika, 110(3):681–697, 11 2022. ISSN 1464-3510. doi: 10.1093/biomet/asac060.
  48. Positive-definite ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-penalized estimation of large covariance matrices. Journal of the American Statistical Association, 107(500):1480–1491, 2012. doi: 10.1080/01621459.2012.725386.
  49. Learning local dependence in ordered data. Journal of Machine Learning Research, 18:1–60, 2017.
  50. The three faces of faithfulness. Synthese, 193:1011 – 1027, 2015.
  51. On the identifiability of the post-nonlinear causal model. UAI ’09, pp.  647–655, Arlington, Virginia, USA, 2009. AUAI Press. ISBN 9780974903958.
  52. On model selection consistency of lasso. J. Mach. Learn. Res., 7:2541–2563, dec 2006. ISSN 1532-4435.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com