Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributed variable screening for generalized linear models (2405.04254v2)

Published 7 May 2024 in stat.ME

Abstract: In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed method selects relevant covariates by using a sparsity-restricted surrogate likelihood estimator. It takes into account the joint effects of the covariates rather than just the marginal effect, and this characteristic enhances the reliability of the screening results. We establish the sure screening property of the proposed method, which ensures that with a high probability, the true model is included in the selected model. Simulation studies are conducted to evaluate the finite sample performance of the proposed method, and an application to a real dataset showcases its practical utility.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Bickel, P. J., Ritov, Y., and Tsybakov, A. B. (2009), “Simultaneous analysis of Lasso and Dantzig selector,” The Annals of Statistics, 37, 1705 – 1732.
  2. Candes, E. and Tao, T. (2007), “The Dantzig selector: Statistical estimation when p𝑝pitalic_p is much larger than n𝑛nitalic_n,” The Annals of Statistics, 35, 2313–2351.
  3. Chen, J. and Chen, Z. (2012), “Extended BIC for small-n𝑛nitalic_n-large-P𝑃Pitalic_P sparse GLM,” Statistica Sinica, 22, 555–574.
  4. Chen, X., Ge, D., Wang, Z., and Ye, Y. (2014), “Complexity of unconstrained L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT minimization,” Mathematical Programming, 143, 371–383.
  5. Fan, J. and Lv, J. (2008), “Sure independence screening for ultrahigh dimensional feature space,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 849–911.
  6. — (2010), “A selective overview of variable selection in high dimensional feature space,” Statistica Sinica, 20, 101–148.
  7. Fan, J. and Song, R. (2010), “Sure independence screening in generalized linear models with NP-dimensionality,” The Annals of Statistics, 38, 3567 – 3604.
  8. Gao, Y., Liu, W., Wang, H., Wang, X., Yan, Y., and Zhang, R. (2022), “A review of distributed statistical inference,” Statistical Theory and Related Fields, 6, 89–99.
  9. Hao, M., Qu, L., Kong, D., Sun, L., and Zhu, H. (2021), “Optimal minimax variable selection for large-scale matrix linear regression model,” Journal of Machine Learning Research, 22, 1–39.
  10. Jordan, M. I., Lee, J. D., and Yang, Y. (2019), “Communication-efficient distributed statistical inference,” Journal of the American Statistical Association, 114, 668–681.
  11. Li, R., Zhong, W., and Zhu, L. (2012), “Feature screening via distance correlation learning,” Journal of the American Statistical Association, 107, 1129–1139.
  12. Li, X., Li, R., Xia, Z., and Xu, C. (2020), “Distributed feature screening via componentwise debiasing,” Journal of Machine Learning Research, 21, 1–32.
  13. Li, X. and Xu, C. (2023), “Feature screening with conditional rank utility for big-data classification,” Journal of the American Statistical Association, 1–11.
  14. Natarajan, B. K. (1995), “Sparse approximate solutions to linear systems,” SIAM journal on computing, 24, 227–234.
  15. Wang, H. (2009), “Forward regression for ultra-high dimensional variable screening,” Journal of the American Statistical Association, 104, 1512–1524.
  16. Xu, C. and Chen, J. (2014), “The sparse MLE for ultrahigh-dimensional feature screening,” Journal of the American Statistical Association, 109, 1257–1269.
  17. Yang, G., Yu, Y., Li, R., and Buu, A. (2016), “Feature screening in ultrahigh dimensional Cox’s model,” Statistica Sinica, 26, 881.
  18. Zhou, L., Gong, Z., and Xiang, P. (2023), “Distributed computing and inference for big data,” Annual Review of Statistics and Its Application.
  19. Zhou, T., Zhu, L., Xu, C., and Li, R. (2020), “Model-free forward screening via cumulative divergence,” Journal of the American Statistical Association, 115, 1393–1405.
  20. Zhu, L., Li, L., Li, R., and Zhu, L. (2011), “Model-free feature screening for ultrahigh-dimensional data,” Journal of the American Statistical Association, 106, 1464–1475.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com