Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Information criteria for structured parameter selection in high dimensional tree and graph models (2306.14026v1)

Published 24 Jun 2023 in cs.LG, math.ST, and stat.TH

Abstract: Parameter selection in high-dimensional models is typically finetuned in a way that keeps the (relative) number of false positives under control. This is because otherwise the few true positives may be dominated by the many possible false positives. This happens, for instance, when the selection follows from a naive optimisation of an information criterion, such as AIC or Mallows's Cp. It can be argued that the overestimation of the selection comes from the optimisation process itself changing the statistics of the selected variables, in a way that the information criterion no longer reflects the true divergence between the selection and the data generating process. In lasso, the overestimation can also be linked to the shrinkage estimator, which makes the selection too tolerant of false positive selections. For these reasons, this paper works on refined information criteria, carefully balancing false positives and false negatives, for use with estimators without shrinkage. In particular, the paper develops corrected Mallows's Cp criteria for structured selection in trees and graphical models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. H. Akaike. Information theory and an extension of the maximum likelihood principle. In B. Petrov and F. Csáki, editors, Second International Symposium on Information Theory, pages 267–281. Akadémiai Kiadó, Budapest, 1973.
  2. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Learning Research, 9:485–516, 2008.
  3. NCBI GEO: mining millions of expression profiles–database and tools. Nucleic Acids Res., 33:D562–D566, 2005.
  4. Valid post-selection inference. The Annals of Statistics, 41(2):802–837, 2013.
  5. Classification and regression trees (CART). Wadsworth, Monterey, CA, USA, 1984.
  6. A. Charkhi and G. Claeskens. Asymptotic postselection inference for the Akaike information criterion. Biometrika, 105(3):645–664, 2018.
  7. Atomic decomposition by basis pursuit. SIAM J. on Scientific Computing, 20(1):33–61, 1998.
  8. G. Claeskens and N. Hjort. The focused information criterion. J. Amer. Statist. Assoc., 98:900–916, 2003.
  9. Entropy based algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2):713–718, 1992.
  10. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society, Series B, 76(2):373–397, 2014.
  11. D. L. Donoho. For most large underdetermined systems of linear equations the minimal ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-norm solution is also the sparsest solution. Comm. on Pure and Applied Mathematics, 59:797–829, 2006.
  12. Ideal spatial adaptation via wavelet shrinkage. Biometrika, 81(3):425–455, 1994.
  13. Least angle regression. The Annals of Statistics, 32(2):407–499, 2004. with discussion.
  14. J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. J. American Statistical Association, 96(456):1348–1360, 12 2001.
  15. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
  16. M. Girardi and W. Sweldens. A new class of unbalanced Haar wavelets that form an unconditional basis for Lpsubscript𝐿𝑝{L_{p}}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT on general measure spaces. J. of Fourier Analysis and Applications, 3(4):457–474, 1997.
  17. N. R. Hansen and A. Sokol. Degrees of freedom for nonlinear leastsquares estimation. Preprint, available as arXiv:1402.2997, 2014.
  18. M. Jansen. Multiscale change point analysis in poisson count data. Chemometrics and Intelligent Laboratory Systems, 85(2):159–169, February 2007.
  19. M. Jansen. Information criteria for variable selection under sparsity. Biometrika, 101(1):37–55, 2014.
  20. M. Jansen. Generalized cross validation in variable selection with and without shrinkage. Journal of Statistical Planning and Inference, 159:90–104, 2015.
  21. M. Jansen. Wavelets from a statistical perspective. CRC Press, first edition, 2022.
  22. A. Javanmard and A. Montanari. Debiasing the lasso: Optimal sample size for gaussian designs. The Annals of Statistics, 46(6A):2593–2622, 2018.
  23. Exact post-selection inference, with application to the lasso. The Annals of Statistics, 44(3):907–927, 2016.
  24. Q. Li and J. Shao. Regularizing lasso: a consistent variable selection method. Statistica Sinica, 25:975–992, 2015.
  25. C. L. Mallows. Some comments on Cpsubscript𝐶𝑝{C}_{p}italic_C start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Technometrics, 15:661–675, 1973.
  26. B. Marquis and M. Jansen. Information criteria bias correction for group selection. Statistical Papers, To Appear:–, 2022.
  27. R. Mazumder and T. Hastie. The graphical lasso: New insights and alternatives. Electronic Journal of Statistics, 6:2125–2149, 2012.
  28. N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436–1462, 2006.
  29. S. Sojoudi. Equivalence of graphical lasso and thresholding for sparse graphs. Journal of Machine Learning Research, 17:1–21, 2016.
  30. Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nature Medicine, 13(3):361–366, 2007.
  31. C. Stein. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics, 9(6):1135–1151, 1981.
  32. R. J. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58(1):267–288, 1996.
  33. Degrees of freedom in lasso problems. The Annals of Statistics, 40(2):1198–1232, 2012.
  34. J. A. Tropp. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Transactions on Information Theory, 52(3):1030–1051, March 2006.
  35. On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42(3):1166–1202, 2014.
  36. M. J. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity using ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5):2183–2202, May 2009.
  37. Y. Yang. Consistency of cross validation for comparing regression procedures. The Annals of Statistics, 35:2450–2473, 2007.
  38. J. Ye. On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc., 93:120–131, 1998.
  39. C. H. Zhang. Nearly unbiased variable selection under the minimax concave penalty. The Annals of Statistics, 38(2):894–942, 2010.
  40. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society, Series B, 76:217–242, 2014.
  41. P. Zhao and B. Yu. On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541–2563, 2006.
  42. High-dimensional covariance estimation based on gaussian graphical models. Journal of Machine Learning Research, 12:2975–3026, 2011.
  43. H. Zou. The adaptive lasso and its oracle properties. J. American Statistical Association, 101:1418–1429, 2006.
  44. On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5):2173–2192, 2007.
Citations (1)

Summary

We haven't generated a summary for this paper yet.