Inconsistency of cross-validation for structure learning in Gaussian graphical models (2312.17047v1)
Abstract: Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probability that the Lasso estimator for the neighborhood of a node within a Gaussian graphical model, optimized using a prediction oracle, misidentifies the neighborhood. Our results pertain to both undirected and directed acyclic graphs, encompassing general, sparse covariance structures. To support our theoretical findings, we conduct an empirical investigation of this inconsistency by contrasting our outcomes with other commonly used information criteria through an extensive simulation study. Given that many algorithms designed to learn the structure of graphical models require hyperparameter selection, the precise calibration of this hyperparameter is paramount for accurately estimating the inherent structure. Consequently, our observations shed light on this widely recognized practical challenge.
- H. Akaike. A new look at the statistical model identification. IEEE transactions on automatic control, 19(6):716–723, 1974.
- B. Aragam and Q. Zhou. Concave Penalized Estimation of Sparse Gaussian Bayesian Networks. Journal of Machine Learning Research, 16(69):2273–2328, 2015.
- S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection. Statistics surveys, 4:40–79, 2010.
- A. A. Azzalini. The R package sn: The skew-normal and related distributions such as the skew-t𝑡titalic_t and the SUN (version 2.1.1). Università degli Studi di Padova, Italia, 2023. URL https://cran.r-project.org/package=sn. Home page: http://azzalini.stat.unipd.it/SN/.
- Risk bounds for model selection via penalization. Probability theory and related fields, 113(3):301–413, 1999.
- Cross-validation: what does it estimate and how well does it do it? Journal of the American Statistical Association, (just-accepted):1–22, 2023.
- Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4):791–806, 2011.
- Tuning causal discovery algorithms. In International Conference on Probabilistic Graphical Models, pages 17–28. PMLR, 2020.
- A constrained l 1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494):594–607, 2011.
- J. Chen and Z. Chen. Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759–771, 2008.
- On causal discovery with an equal-variance assumption. Biometrika, 106(4):973–980, 2019.
- On cross-validated lasso in high dimensions. The Annals of Statistics, 49(3):1300–1317, 2021.
- A practical scheme and fast algorithm to tune the lasso with optimality guarantees. The Journal of Machine Learning Research, 17(1):8162–8181, 2016.
- D. M. Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554, 2003.
- D. M. Chickering and C. Meek. Finding optimal Bayesian networks. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pages 94–102. Morgan Kaufmann Publishers Inc., 2002.
- Model selection and model averaging. Cambridge Books, 2008.
- Markov random field texture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, (1):25–39, 1983.
- G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems:1695, 2006. URL https://igraph.org.
- B. Efron. Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American statistical association, pages 316–331, 1983.
- Least angle regression. The Annals of Statistics, 32(2):407–451, 2004.
- R. Foygel and M. Drton. Extended bayesian information criteria for gaussian graphical models. Advances in neural information processing systems, 23, 2010.
- Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
- Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010. doi: 10.18637/jss.v033.i01.
- glasso: Graphical Lasso: Estimation of Gaussian Graphical Models, 2019. URL https://CRAN.R-project.org/package=glasso. R package version 1.11.
- N. Friedman and Z. Yakhini. On the sample complexity of learning bayesian networks. In Uncertainty in Artifical Intelligence (UAI), 02 1996.
- F. Fu and Q. Zhou. Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent. Journal of the American Statistical Association, 108(501):288–300, 2013.
- S. Geisser. The predictive sample reuse method with applications. Journal of the American statistical Association, pages 320–328, 1975.
- A. Ghoshal and J. Honorio. Learning identifiable gaussian bayesian networks in polynomial time and sample complexity. Advances in Neural Information Processing Systems, 30, 2017.
- A. Ghoshal and J. Honorio. Learning linear structural equation models in polynomial time and sample complexity. In International Conference on Artificial Intelligence and Statistics, pages 1466–1475. PMLR, 2018.
- P. Grünwald and T. van Ommen. Inconsistency of bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Analysis, 12(4):1069–1103, 2017.
- P. D. Grünwald. Bayesian inconsistency under misspecification. In Four page abstract of a plenary presentation at the Valencia 8 ISBA conference on Bayesian statistics, 2006.
- P. D. Grünwald. The minimum description length principle. MIT press, 2007.
- D. M. Haughton. On the choice of a model to fit data from an exponential family. Annals of Statistics, 16(1):342–355, 1988.
- A. M. Herzberg and A. Tsukanov. A note on modifications of the jackknife criterion for model selection. Utilitas Math, 29:209–216, 1986.
- mstknnclust: MST-kNN Clustering Algorithm, 2023. URL https://CRAN.R-project.org/package=mstknnclust. R package version 0.3.2.
- Consistent model selection criteria on high dimensions. The Journal of Machine Learning Research, 13:1037–1057, 2012.
- S. L. Lauritzen. Graphical models, volume 17. Clarendon Press, 1996.
- J. Lederer and C. Müller. Don’t fall for tuning parameters: tuning-free variable selection in high dimensions with the trex. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015.
- J. Lei. Cross-validation with confidence. Journal of the American Statistical Association, 115(532):1978–1997, 2020.
- K.-C. Li. Asymptotic optimality for cp, cl, cross-validation and generalized cross-validation: discrete index set. The Annals of Statistics, pages 958–975, 1987.
- flare: Family of Lasso Regression, 2020. URL https://CRAN.R-project.org/package=flare. R package version 1.7.0.
- H. Liu and L. Wang. Tiger: A tuning-insensitive approach for optimally estimating gaussian graphical models. 2017.
- C. L. Mallows. Some comments on cp. Technometrics, 15(4):661–675, 1973. ISSN 00401706. URL http://www.jstor.org/stable/1267380.
- C. Manning and H. Schutze. Foundations of statistical natural language processing. MIT press, 1999.
- P. Massart. Concentration inequalities and model selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003. Springer, 2007.
- C. Meek. Graphical Models: Selecting causal and statistical models. PhD thesis, Carnegie Mellon University, 1997.
- N. Meinshausen. A note on the lasso for gaussian graphical model selection. Statistics & Probability Letters, 78(7):880–884, 2008.
- N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34(3):1436–1462, 2006.
- Gene regulatory networks from multifactorial perturbations using graphical lasso: application to the dream4 challenge. PloS one, 5(12):e14147, 2010.
- Selection of the regularization parameter in graphical models using network characteristics. Journal of Computational and Graphical Statistics, 27(2):323–333, 2018.
- G. Park and Y. Kim. Identifiability of gaussian linear structural equation models with homogeneous and heterogeneous error variances. Journal of the Korean Statistical Society, pages 1–17, 2020.
- Cross-validation of regression models. Journal of the American Statistical Association, pages 575–583, 1984.
- R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2021. URL https://www.R-project.org/.
- Learning graphical model structure using L1-regularization paths. In AAAI, volume 7, pages 1278–1283, 2007.
- G. Schwarz. Estimating the dimension of a model. The annals of statistics, pages 461–464, 1978.
- J. Shao. Linear model selection by cross-validation. Journal of the American statistical Association, 88(422):486–494, 1993.
- J. Shao. An asymptotic theory for linear model selection. Statistica sinica, pages 221–242, 1997.
- A. Shojaie and G. Michailidis. Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika, 97(3):519–538, 2010.
- M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society: Series B (Methodological), 36(2):111–133, 1974.
- M. Stone. An asymptotic equivalence of choice of model by cross-validation and akaike’s criterion. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):44–47, 1977.
- False discoveries occur early on the lasso path. The Annals of statistics, pages 2133–2150, 2017.
- T. Sun and C.-H. Zhang. Scaled sparse linear regression. Biometrika, 99(4):879–898, 2012.
- Brain covariance selection: better individual functional connectivity models using population prior. Advances in neural information processing systems, 23, 2010.
- Modern Applied Statistics with S. Springer, New York, fourth edition, 2002. URL https://www.stats.ox.ac.uk/pub/MASS4/. ISBN 0-387-95457-0.
- G. Wahba and S. Wold. A completely automatic french curve: fitting spline functions by cross validation. Communications in Statistics-Theory and Methods, 4(1):1–17, 1975.
- Which bridge estimator is the best for variable selection? The Annals of Statistics, 48(5):2791 – 2823, 2020a. doi: 10.1214/19-AOS1906. URL https://doi.org/10.1214/19-AOS1906.
- Learning high-dimensional gaussian graphical models under total positivity without adjustment of tuning parameters. In International Conference on Artificial Intelligence and Statistics, pages 2698–2708. PMLR, 2020b.
- Approximate cross-validation: Guarantees for model assessment and selection. In International Conference on Artificial Intelligence and Statistics, pages 4530–4540. PMLR, 2020.
- J. Xiang and S. Kim. A* Lasso for learning a sparse Bayesian network structure for continuous variables. In Advances in Neural Information Processing Systems, pages 2418–2426, 2013.
- Y. Yang. Consistency of cross validation for comparing regression procedures. 2007.
- G. Yu and J. Bien. Estimating the error variance in a high-dimensional linear model. Biometrika, 106(3):533–546, 2019.
- M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical model. Biometrika, 94(1):19–35, 2007.
- P. Zhang. Model selection via multifold cross validation. The annals of statistics, pages 299–313, 1993.