Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-dimensional Inference and FDR Control for Simulated Markov Random Fields (2202.05612v3)

Published 11 Feb 2022 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: Identifying important features linked to a response variable is a fundamental task in various scientific domains. This article explores statistical inference for simulated Markov random fields in high-dimensional settings. We introduce a methodology based on Markov Chain Monte Carlo Maximum Likelihood Estimation (MCMC-MLE) with Elastic-net regularization. Under mild conditions on the MCMC method, our penalized MCMC-MLE method achieves $\ell_{1}$-consistency. We propose a decorrelated score test, establishing both its asymptotic normality and that of a one-step estimator, along with the associated confidence interval. Furthermore, we construct two false discovery rate control procedures via the asymptotic behaviors for both p-values and e-values. Comprehensive numerical simulations confirm the theoretical validity of the proposed methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Agresti, A. (2003). Categorical Data Analysis, Volume 482. John Wiley & Sons.
  2. The empirical distribution of a large number of correlated normal variables. Journal of the American Statistical Association 110(511), 1217–1228.
  3. Model Selection through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data. The Journal of Machine Learning Research 9(Mar),  485–516.
  4. Barber, R. F. and E. J. Candès (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics 43(5).
  5. Barndorff-Nielsen, O. E. (2014). Information and Exponential Families: in Statistical Theory. John Wiley & Sons.
  6. Baxter, R. J. (2016). Exactly Solved Models in Statistical Mechanics. Elsevier.
  7. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57(1), 289–300.
  8. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 1165–1188.
  9. Besag, J. (1974). Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society: Series B (Methodological) 36(2),  192–225.
  10. Simultaneous Analysis of Lasso and Dantzig Selector. The Annals of Statistics 37(4),  1705–1732.
  11. Latent Dirichlet Allocation. The Journal of Machine Learning Research 3(Jan),  993–1022.
  12. Discussion of” a novel algorithmic approach to bayesian logic regression” by a. hubin, g. storvik and f. frommlet.
  13. Borwein, J. and A. S. Lewis (2010). Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer Science & Business Media.
  14. Brown, L. D. (1986). Fundamentals of statistical exponential families: with applications in statistical decision theory. Ims.
  15. Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19(4), 1212–1242.
  16. Inference for individual mediation effects and interventional effects in sparse high-dimensional causal graphical models. arXiv preprint arXiv:1809.10652.
  17. Complexity of Inference in Graphical Models. arXiv preprint arXiv:1206.3240.
  18. The systematic comparison between gaussian mirror and model-x knockoff models. Scientific Reports 13(1), 5478.
  19. High-dimensional data bootstrap. Annual Review of Statistics and Its Application 10, 427–449.
  20. Cook, S. A. (1971). The Complexity of Theorem-Proving Procedures. In Proceedings of the third annual ACM symposium on Theory of computing, pp.   151–158.
  21. Cross, G. R. and A. K. Jain (1983). Markov Random Field Texture Models. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5(1),  25–39.
  22. Directional fdr control for sub-gaussian sparse glms. arXiv preprint arXiv:2105.00393.
  23. False discovery rate control via data splitting. Journal of the American Statistical Association, 1–18.
  24. A scale-free approach for false discovery rate control in generalized linear models. Journal of the American Statistical Association, 1–15.
  25. False discovery rate control under general dependence by symmetrized data aggregation. Journal of the American Statistical Association 118(541), 607–621.
  26. The Analysis of Contingency Tables by Graphical Models. Biometrika 70(3),  553–565.
  27. Efron, B. (1978). The Geometry of Exponential Families. The Annals of Statistics 6(2),  362–376.
  28. Testing and Confidence Intervals for High Dimensional Proportional Hazards Models. Journal of the Royal Statistical Society: Series B (Methodological) 79(5),  1415–1437.
  29. Fienberg, S. E. (2000). Contingency Tables and Log-Linear Models: Basic Results and New Developments. Journal of the American Statistical Association 95(450),  643–647.
  30. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6(6),  721–741.
  31. Stochastic learning for sparse discrete markov random fields with controlled gradient approximation error. In Uncertainty in artificial intelligence: proceedings of the… conference. Conference on Uncertainty in Artificial Intelligence, Volume 2018, pp.  156. NIH Public Access.
  32. Geyer, C. J. (1992). Markov Chain Monte Carlo Maximum Likelihood. Technical report, Minnesota University, Minneapolis, School Of Statistics.
  33. Geyer, C. J. (1994). On the Convergence of Monte Carlo Maximum Likelihood Calculations. Journal of the Royal Statistical Society: Series B (Methodological) 56(1),  261–274.
  34. Gilks, W. R. (2005). Markov Chain Monte Carlo. Encyclopedia of Biostatistics 4.
  35. Gyftodimos, E. and P. A. Flach (2002). Hierarchical Bayesian Networks: A Probabilistic Reasoning Model for Structured Domains. In Proceedings of the ICML-2002 Workshop on Development of Representations, pp.   23–30. Citeseer.
  36. The Use of Markov Random Fields as Models of Texture. In Image Modeling, pp.   185–198. Elsevier.
  37. Statistical learning with sparsity: the lasso and generalizations. CRC press.
  38. Hastings, W. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika, 97–109.
  39. Convex Analysis and Minimization Algorithms I: Fundamentals, Volume 305. Springer science & business media.
  40. Relaxing the assumptions of knockoffs by conditioning. The Annals of Statistics 48(5), 3021–3042.
  41. Ising, E. (1925). Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik 31(1),  253–258.
  42. On Learning Discrete Graphical Models using Greedy Methods. In Advances in Neural Information Processing Systems, pp.   1935–1943.
  43. Bernstein’s inequality for general markov chains. arXiv preprint arXiv:1805.10721.
  44. Kindermann, R. and J. L. Snell (1980). Markov random fields and their applications, Volume 1. American Mathematical Society.
  45. Sparse and compositionally robust inference of microbial ecological networks. PLoS Computational Biology 11(5), e1004226.
  46. Foundations of Statistical Natural Language Processing. MIT press.
  47. High-dimensional additive modeling. The Annals of Statistics 37(6B), 3779–3821.
  48. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34(3), 1436–1462.
  49. Sparse Estimation in Ising Model via Penalized Monte Carlo Methods. The Journal of Machine Learning Research 19(1),  2979–3004.
  50. Sided and Symmetrized Bregman Centroids. IEEE Transactions on Information Theory 55(6),  2882–2904.
  51. A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models. The Annals of Statistics 45(1),  158–195.
  52. Restricted eigenvalue properties for correlated gaussian designs. The Journal of Machine Learning Research 11, 2241–2259.
  53. Ripley, B. D. (1984). Spatial Statistics: Developments 1980-3, Correspondent Paper. International Statistical Review/Revue Internationale de Statistique,  141–150.
  54. Monte Carlo Statistical Methods. Springer Science & Business Media.
  55. Deep knockoffs. Journal of the American Statistical Association 115(532), 1861–1872.
  56. Gaussian Markov random fields: theory and applications. CRC press.
  57. Statistical inference for high-dimensional models via recursive online-score estimation. Journal of the American Statistical Association 116(535), 1307–1318.
  58. Speed, T. P. and H. T. Kiiveri (1986). Gaussian Markov Distributions over Finite Graphs. The Annals of Statistics,  138–150.
  59. Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Methodological) 64(3), 479–498.
  60. Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1),  267–288.
  61. van de Geer, S. A. (2008). High-dimensional Generalized Linear Models and the Lasso. The Annals of Statistics 36(2),  614–645.
  62. van de Geer, S. A. and P. Bühlmann (2009). On the conditions used to prove oracle results for the lasso. Electronic Journal of Statistics 3, 1360–1392.
  63. Van der Vaart, A. W. (2000). Asymptotic statistics, Volume 3. Cambridge university press.
  64. True and false discoveries with e-values. arXiv preprint arXiv:1912.13292 54.
  65. E-values: Calibration, combination and applications. The Annals of Statistics 49(3), 1736–1754.
  66. Wainwright, M. J. and M. I. Jordan (2008). Graphical Models, Exponential Families, and Variational Inference. Now Publishers Inc.
  67. False discovery rate control with e-values. Journal of the Royal Statistical Society: Series B (Methodological) 84(3), 822–852.
  68. Wermuth, N. and S. L. Lauritzen (1982). Graphical and Recursive Models for Contigency Tables. Institut for Elektroniske Systemer, Aalborg Universitetscenter.
  69. Woods, J. (1978). Markov Image Modeling. IEEE Transactions on Automatic Control 23(5),  846–850.
  70. Yu, Y. (2010). High-dimensional Variable Selection in Cox Model with Generalized Lasso-type Convex Penalty.
  71. Zhang, H. and S. X. Chen (2021). Concentration inequalities for statistical inference. Communications in Mathematical Research 37(1), 1–85.
  72. Elastic-net regularized high-dimensional negative binomial regression: Consistency and weak signals detection. Statistica Sinica 32, 181–207.
  73. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Methodological) 67(2),  301–320.

Summary

We haven't generated a summary for this paper yet.