Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting Causal Additive Models (2401.06523v1)

Published 12 Jan 2024 in stat.ML, cs.LG, math.PR, math.ST, and stat.TH

Abstract: We present a boosting-based method to learn additive Structural Equation Models (SEMs) from observational data, with a focus on the theoretical aspects of determining the causal order among variables. We introduce a family of score functions based on arbitrary regression techniques, for which we establish necessary conditions to consistently favor the true causal ordering. Our analysis reveals that boosting with early stopping meets these criteria and thus offers a consistent score function for causal orderings. To address the challenges posed by high-dimensional data sets, we adapt our approach through a component-wise gradient descent in the space of additive SEMs. Our simulation study underlines our theoretical results for lower dimensions and demonstrates that our high-dimensional adaptation is competitive with state-of-the-art methods. In addition, it exhibits robustness with respect to the choice of the hyperparameters making the procedure easy to tune.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. CASTLE: Regularization via auxiliary causal graph discovery. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1501–1512. Curran Associates, Inc., 2020.
  2. Scenic: single-cell regulatory network inference and clustering. Nature methods, 14(11):1083–1086, 2017.
  3. Learning causal graphs in manufacturing domains using structural equation models. International Journal of Semantic Computing, 17(04):511–528, 2023.
  4. Differentiable causal discovery under unmeasured confounding. In Arindam Banerjee and Kenji Fukumizu, editors, International Conference on Artificial Intelligence and Statistics, volume 130, pages 2314–2322. The Proceedings of Machine Learning Research, 13–15 Apr 2021.
  5. Kernel-based conditional independence test and application in causal discovery. In Conference on Uncertainty in Artificial Intelligence, pages 804–813, Arlington, Virginia, USA, 2011. AUAI Press. ISBN 9780974903972.
  6. The hardness of conditional independence testing and the generalised covariance measure. Annals of Statistics, 48(3):1514–1538, 2018.
  7. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press, Cambridge, MA, USA, 2017.
  8. Causation, Prediction, and Search, volume 81. MIT Press, 01 1993. ISBN 978-1-4612-7650-0. doi: https://doi.org/10.7551/mitpress/1754.001.0001.
  9. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. The Journal of Machine Learning Research, 8(3):613–636, 2007.
  10. Causal discovery with continuous additive noise models. The Journal of Machine Learning Research, 15(1):2009–2053, 01 2014. ISSN 1532–4435.
  11. D’ya like DAGs? a survey on structure learning and causal discovery. ACM Computing Surveys, 55(4), 11 2022.
  12. Gradient-based neural DAG learning. In International Conference on Learning Representations, 2019.
  13. DAG-GNN: DAG structure learning with graph neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, International Conference on Machine Learning, volume 97 of The Proceedings of Machine Learning Research, pages 7154–7163. PMLR, 09–15 Jun 2019.
  14. Learning sparse nonparametric dags. In International Conference on Artificial Intelligence and Statistics, pages 3414–3425. The Proceedings of Machine Learning Research, 2020.
  15. Structural agnostic modeling: Adversarial learning of causal graphs. The Journal of Machine Learning Research, 23(219):1–62, 2022.
  16. On the convergence of continuous constrained optimization for structure learning. In International Conference on Artificial Intelligence and Statistics, pages 8176–8198. The Proceedings of Machine Learning Research, 2022a.
  17. Masked gradient-based causal structure learning. In International Conference on Data Mining (SDM), pages 424–432. SIAM, 2022b.
  18. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. The Journal of Machine Learning Research, 12:1225–1248, 2011.
  19. CAM: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42(6):2526–2556, 2014. doi: 10.1214/14-AOS1260.
  20. Unsuitability of NOTEARS for causal graph discovery when dealing with dimensional quantities. Neural Processing Letters, 54(3):1587–1595, 2022.
  21. Ordering-based search: A simple and effective algorithm for learning bayesian networks. In Conference on Uncertainty in Artificial Intelligence, pages 584–590, Arlington, Virginia, USA, 2005. AUAI Press. ISBN 0974903914.
  22. Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika, 97(3):519–538, 07 2010.
  23. Boosting with the l2 loss. Journal of the American Statistical Association, 98(462):324–339, 2003. doi: 10.1198/016214503000125. URL https://doi.org/10.1198/016214503000125.
  24. Early stopping and non-parametric regression: an optimal data-dependent stopping rule. The Journal of Machine Learning Research, 15(1):335–366, 2014.
  25. Boosting: Foundations and Algorithms. The MIT Press, 2012. ISBN 0262017180.
  26. Grace Wahba. Spline Models for Observational Data. Society for Industrial and Applied Mathematics, 1990. doi: 10.1137/1.9781611970128.
  27. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. ISBN 0262194759.
  28. Martin J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge, 2019.
  29. Additive approximations in high dimensional nonparametric regression via the salsa. In International Conference on Machine Learning, pages 69–78. The Proceedings of Machine Learning Research, 2016.
  30. Hongwei Sun. Mercer theorem for RKHS on noncompact sets. Journal of Complexity, 21(3):337–349, 2005. ISSN 0885-064X.
  31. Kernel independent component analysis. The Journal of Machine Learning Research, 3(Jul):1–48, 2002.
  32. On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39(1):1–49, 2002.
  33. Bruno Peter Zwahlen. Über die Eigenwerte der Summe zweier selbstadjungierter Operatoren. Commentarii Mathematici Helvetici, 40:81–116, 1965/66.
  34. Ha Quang Minh. Some properties of Gaussian reproducing kernel Hilbert spaces and their implications for function approximation and learning theory. Constructive Approximation, 32:307–338, 2010.
  35. Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics, 62(4):961–971, 2006.
  36. Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22(4):477–505, 2007. doi: 10.1214/07-STS242.
  37. On random graphs i. Publicationes Mathematicae Debrecen, 6:290, 1959.
  38. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999. doi: 10.1126/science.286.5439.509.
  39. The large-scale organization of metabolic networks. Nature, 407(6804):651–654, 2000.
  40. Sparse graphical gaussian modeling of the isoprenoid gene network in arabidopsis thaliana. Genome biology, 5(11):1–13, 2004.
  41. DAGs with NO TEARS: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems, volume 32, 2018.
  42. Hanson-Wright inequality and sub-Gaussian concentration. Electronic Communications in Probability, 18:1–9, 2013. doi: 10.1214/ECP.v18-2865.
  43. Sara van de Geer. On the uniform convergence of empirical norms and inner products, with application to causal inference. Electronic Journal of Statistics, 8(1):543–574, 2014.
  44. Rademacher and Gaussian complexities: Risk bounds and structural results. The Journal of Machine Learning Research, 3(Nov):463–482, 2002.

Summary

We haven't generated a summary for this paper yet.