Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Integer Programming for Learning Directed Acyclic Graphs from Non-identifiable Gaussian Models (2404.12592v2)

Published 19 Apr 2024 in stat.ME and stat.ML

Abstract: We study the problem of learning directed acyclic graphs from continuous observational data, generated according to a linear Gaussian structural equation model. State-of-the-art structure learning methods for this setting have at least one of the following shortcomings: i) they cannot provide optimality guarantees and can suffer from learning sub-optimal models; ii) they rely on the stringent assumption that the noise is homoscedastic, and hence the underlying model is fully identifiable. We overcome these shortcomings and develop a computationally efficient mixed-integer programming framework for learning medium-sized problems that accounts for arbitrary heteroscedastic noise. We present an early stopping criterion under which we can terminate the branch-and-bound procedure to achieve an asymptotically optimal solution and establish the consistency of this approximate solution. In addition, we show via numerical experiments that our method outperforms state-of-the-art algorithms and is robust to noise heteroscedasticity, whereas the performance of some competing methods deteriorates under strong violations of the identifiability assumption. The software implementation of our method is available as the Python package \emph{micodag}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Integer linear programming for the bayesian network structure learning problem. Artificial Intelligence 244, 258–271. URL: https://www.sciencedirect.com/science/article/pii/S0004370215000417, doi:https://doi.org/10.1016/j.artint.2015.03.003. combining Constraint Solving with Mining and Learning.
  2. A constrained ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association 106, 594–607.
  3. On causal discovery with an equal-variance assumption. Biometrika 106, 973–980. URL: https://doi.org/10.1093/biomet/asz049, doi:10.1093/biomet/asz049, arXiv:https://academic.oup.com/biomet/article-pdf/106/4/973/30646770/asz049.pdf.
  4. Optimal structure identification with greedy search. Journal of Machine Learning Research 3, 507–554.
  5. Statistically efficient greedy equivalence search, in: Uncertainty in Artificial Intelligence.
  6. Convex relaxations and MIQCQP reformulations for a class of cardinality-constrained portfolio selection problems. J. of Global Optimization 56, 1409–1423. URL: https://doi.org/10.1007/s10898-012-9842-2, doi:10.1007/s10898-012-9842-2.
  7. Polyhedral aspects of score equivalence in Bayesian network structure learning. Mathematical Programming 164, 285–324.
  8. Bayesian network structure learning with integer programming: Polytopes, facets and complexity. Journal of Artificial Intelligence Research 58, 185–229.
  9. Structure learning in graphical modeling. Annual Review of Statistics and Its Application 4, 365–393. URL: https://doi.org/10.1146/annurev-statistics-060116-053803, doi:10.1146/annurev-statistics-060116-053803, arXiv:https://doi.org/10.1146/annurev-statistics-060116-053803.
  10. An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Mathematical Programming 36, 307–339.
  11. SDP diagonalizations and perspective cuts for a class of nonseparable miqp. Operations Research Letters 35, 181–185. URL: https://www.sciencedirect.com/science/article/pii/S0167637706000502, doi:https://doi.org/10.1016/j.orl.2006.03.008.
  12. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441. URL: https://doi.org/10.1093/biostatistics/kxm045, doi:10.1093/biostatistics/kxm045, arXiv:https://academic.oup.com/biostatistics/article-pdf/9/3/432/17742149/kxm045.pdf.
  13. ℓ0subscriptℓ0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-penalized maximum likelihood for sparse directed acyclic graphs. Annals of Statistics 41, 536 – 567. URL: https://doi.org/10.1214/13-AOS1085, doi:10.1214/13-AOS1085.
  14. Learning linear structural equation models in polynomial time and sample complexity, in: Storkey, A., Perez-Cruz, F. (Eds.), Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR. pp. 1466–1475. URL: https://proceedings.mlr.press/v84/ghoshal18a.html.
  15. On the acyclic subgraph polytope. Mathematical Programming 33, 28–42. URL: https://api.semanticscholar.org/CorpusID:206798683.
  16. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research 8, 613–636. URL: http://jmlr.org/papers/v8/kalisch07a.html.
  17. Consistent second-order conic integer programming for learning Bayesian networks. Journal of Mchine Learning Research (in press) arXiv:2005.14346.
  18. Integer programming for learning directed acyclic graphs from continuous data. INFORMS Journal on Optimization 3, 46–73. URL: https://doi.org/10.1287/ijoo.2019.0040, doi:10.1287/ijoo.2019.0040, arXiv:https://doi.org/10.1287/ijoo.2019.0040.
  19. Bayesian network learning via topological order. Journal of Machine Learning Research 18, 1–32. URL: http://jmlr.org/papers/v18/17-033.html.
  20. Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101, 219–228. URL: https://doi.org/10.1093%2Fbiomet%2Fast043, doi:10.1093/biomet/ast043.
  21. Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97, 519–538. URL: https://doi.org/10.1093/biomet/asq038, doi:10.1093/biomet/asq038, arXiv:https://academic.oup.com/biomet/article-pdf/97/3/519/672613/asq038.pdf.
  22. A simple approach for finding the globally optimal Bayesian network structure, in: Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Virginia, USA. p. 445–452.
  23. Causation, Prediction, and Search. The MIT Press. doi:10.1007/978-1-4612-2748-9.
  24. Geometry of the faithfulness assumption in causal inference. Annals of Statistics 41, 436–463. URL: https://api.semanticscholar.org/CorpusID:14215694.
  25. On the convex hull of convex quadratic optimization problems with indicators. Mathematical Programming URL: https://doi.org/10.1007/s10107-023-01982-0, doi:10.1007/s10107-023-01982-0. article in Advance.
  26. Ideal formulations for constrained convex optimization problems with indicator variables. Mathematical Programming 192, 57–88.
  27. On the convexification of constrained quadratic optimization problems with indicator variables, in: International Conference on Integer Programming and Combinatorial Optimization, Springer. pp. 433–447.
  28. Integer and Combinatorial Optimization. Wiley Series in Discrete Mathematics and Optimization, Wiley. URL: https://books.google.com/books?id=vvm4DwAAQBAJ.
Citations (1)

Summary

We haven't generated a summary for this paper yet.