Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive debiased SGD in high-dimensional GLMs with streaming data (2405.18284v3)

Published 28 May 2024 in stat.ML and cs.LG

Abstract: Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing methods that either require full dataset access or large-dimensional summary statistics storage, our method operates in a single-pass mode, significantly reducing both time and space complexity. The core of our methodological innovation lies in an adaptive stochastic gradient descent algorithm tailored for dynamic objective functions, coupled with a novel online debiasing procedure. This allows us to maintain low-dimensional summary statistics while effectively controlling the optimization error introduced by the dynamically changing loss functions. We establish the asymptotic normality of our proposed Adaptive Debiased Lasso (ADL) estimator. We conduct extensive simulation experiments to show the statistical validity and computational efficiency of our ADL estimator across various settings. Its computational efficiency is further demonstrated via a real data application to the spam email classification.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions. Advances in Neural Information Processing Systems, 25.
  2. Post-selection inference for generalized linear models with many controls. J. Bus. Econom. Statist., 34(4):606–619.
  3. Statistical inference for high-dimensional generalized linear models with binary outcomes. J. Amer. Statist. Assoc., 118(542):1319–1332.
  4. Statistical inference for model parameters in stochastic gradient descent. Ann. Statist., 48(1):251–273.
  5. Online debiasing for adaptively collected high-dimensional data with applications to time series analysis. J. Amer. Statist. Assoc., 118(542):1126–1139.
  6. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc., 96(456):1348–1360.
  7. Fang, Y. (2019). Scalable statistical inference for averaged implicit stochastic gradient descent. Scand. J. Stat., 46(4):987–1002.
  8. Inference for the case probability in high-dimensional logistic regression. J. Mach. Learn. Res., 22(254):1–54.
  9. Online inference with debiased stochastic gradient descent. Biometrika, 111(1):93–108.
  10. Online inference in high-dimensional generalized linear models with streaming data. Electron. J. Stat., 17(2):3443 – 3471.
  11. Renewable estimation and incremental inference in generalized linear models with streaming data sets. J. R. Stat. Soc. Ser. B, 82(1):69–97.
  12. Real-time regression analysis of streaming clustered data with possible abnormal data batches. J. Amer. Statist. Assoc., 118(543):2029–2044.
  13. Identifying suspicious urls: an application of large-scale online learning. In Proceedings of the 26th annual international conference on machine learning, pages 681–688.
  14. Global and simultaneous hypothesis testing for high-dimensional logistic regression models. J. Amer. Statist. Assoc., 116(534):984–998.
  15. A Unified Framework for High-Dimensional Analysis of M𝑀Mitalic_M-Estimators with Decomposable Regularizers. Statist. Sci., 27(4):538 – 557.
  16. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Statist., 45(1):158 – 195.
  17. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30(4):838–855.
  18. A stochastic approximation method. Ann. Inst. Statist. Math., pages 400–407.
  19. Online updating of statistical inference in the big data setting. Technometrics, 58(3):393–403.
  20. Statistical inference for high-dimensional models via recursive online-score estimation. J. Amer. Statist. Assoc., pages 1–12.
  21. Asymptotic and finite-sample properties of estimators based on stochastic gradients. Ann. Statist., 45(4):1694–1727.
  22. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist., 42(3):1166–1202.
  23. Xiao, L. (2009). Dual averaging method for regularized stochastic learning and online optimization. Advances in Neural Information Processing Systems, 22.
  24. Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist., 38(2):894–942.
  25. Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B, 76(1):217 – 242.
  26. On model selection consistency of lasso. J. Mach. Learn. Res., 7:2541–2563.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com