Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 70 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 175 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

A Novel Framework for Online Supervised Learning with Feature Selection (1803.11521v10)

Published 30 Mar 2018 in stat.ML and cs.LG

Abstract: Current online learning methods suffer issues such as lower convergence rates and limited capability to select important features compared to their offline counterparts. In this paper, a novel framework for online learning based on running averages is proposed. Many popular offline regularized methods such as Lasso, Elastic Net, Minimax Concave Penalty (MCP), and Feature Selection with Annealing (FSA) have their online versions introduced in this framework. The equivalence between the proposed online methods and their offline counterparts is proved, and then novel theoretical true support recovery and convergence guarantees are provided for some of the methods in this framework. Numerical experiments indicate that the proposed methods enjoy high true support recovery accuracy and a faster convergence rate compared with conventional online and offline algorithms. Finally, applications to large datasets are presented, where again the proposed framework shows competitive results compared to popular online and offline algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Barbu, A., She, Y., Ding, L., and Gramajo, G. (2017), “Feature selection with annealing for computer vision and big data learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 272–286.
  2. Cai, Y., Sun, Y., Li, J., and Goodison, S. (2009), “Online feature selection algorithm with Bayesian ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT regularization,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining.
  3. Chu, C.-T., Kim, S. K., Lin, Y.-A., Yu, Y., Bradski, G., Olukotun, K., and Ng, A. Y. (2007), “Map-reduce for machine learning on multicore,” in NIPS, pp. 281–288.
  4. Cotter, A., Shamir, O., Srebro, N., and Sridharan, K. (2011), “Better mini-batch algorithms via accelerated gradient methods,” in NIPS, pp. 1647–1655.
  5. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009), “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 248–255.
  6. Duchi, J. and Singer, Y. (2009), “Efficient online and batch learning using forward backward splitting,” Journal of Machine Learning Research, 10, 2899–2934.
  7. Fan, J., Gong, W., Li, C. J., and Sun, Q. (2018), “Statistical sparse online regression: A diffusion approximation perspective,” in AISTATS, pp. 1017–1026.
  8. Fan, J. and Li, R. (2001), “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, 96, 1348–1360.
  9. Hazan, E., Agarwal, A., and Kale, S. (2007), “Logarithmic regret algorithms for online convex optimization,” Machine Learning, 69, 169–192.
  10. Javanmard, A. (2017), “Perishability of data: dynamic pricing under varying-coefficient models,” The Journal of Machine Learning Research, 18, 1714–1744.
  11. Kearns, M. (1998), “Efficient noise-tolerant learning from statistical queries,” Journal of the ACM (JACM), 45, 983–1006.
  12. Langford, J., Li, L., and Zhang, T. (2009), “Sparse online learning via truncated gradient,” Journal of Machine Learning Research, 10, 777–801.
  13. Liang, F., Xue, J., and Jia, B. (2022), “Markov neighborhood regression for high-dimensional inference,” Journal of the American Statistical Association, 117, 1200–1214.
  14. Lichman, M. (2013), “UCI Machine Learning Repository,” .
  15. Loh, P.-L., Wainwright, M. J., et al. (2017), “Support recovery without incoherence: A case for nonconvex regularization,” The Annals of Statistics, 45, 2455–2482.
  16. Luo, L. and Song, P. X.-K. (2019), “Renewable estimation and incremental inference in generalized linear models with streaming data sets,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82.
  17. — (2023), “Multivariate online regression analysis with heterogeneous streaming data,” Canadian Journal of Statistics, 51, 111–133.
  18. Nesterov, Y. (2009), “Primal-dual subgradient methods for convex problems,” Mathematical programming, 120, 221–259.
  19. Neykov, M., Liu, J. S., and Cai, T. (2016), “L1-regularized least squares for support recovery of high dimensional single index models with Gaussian designs,” Journal of Machine Learning Research, 17, 1–37.
  20. Nguyen, N., Needell, D., and Woolf, T. (2017), “Linear convergence of stochastic iterative greedy algorithms with sparse constraints,” IEEE Transactions on Information Theory, 63, 6869–6895.
  21. Ouyang, H., He, N., Tran, L., and Gray, A. (2013), “Stochastic alternating direction method of multipliers,” in ICML, pp. 80–88.
  22. Page, E. S. (1955), “A test for a change in a parameter occurring at an unknown point,” Biometrika, 42, 523–527.
  23. Qiang, S. and Bayati, M. (2016), “Dynamic pricing with demand covariates,” Available at SSRN 2765257.
  24. Rothe, R., Timofte, R., and Van Gool, L. (2015), “Dex: Deep expectation of apparent age from a single image,” in ICCV Workshops, pp. 10–15.
  25. — (2018), “Deep expectation of real and apparent age from a single image without facial landmarks,” International Journal of Computer Vision, 126, 144–157.
  26. Schifano, E., Wu, J., Wang, C., Yan, J., and Chen, M.-H. (2016), “Online updating of statistical inference in the big data setting,” Technometrics, 58, 393–403.
  27. She, Y. et al. (2009), “Thresholding-based iterative selection procedures for model selection and shrinkage,” Electronic Journal of Statistics, 3, 384–415.
  28. Simonyan, K. and Zisserman, A. (2014), “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556.
  29. Suzuki, T. (2013), “Dual averaging and proximal gradient descent for online alternating direction multiplier method,” in ICML, pp. 392–400.
  30. Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
  31. Wainwright, M. J. (2009), “Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-Constrained Quadratic Programming (Lasso),” IEEE Transactions on Information Theory, 55, 2183–2202.
  32. Wang, J. and Li, H. (2021), “Estimation of genetic correlation with summary association statistics,” Biometrika, 109, 421–438.
  33. Wang, J., Zhao, P., Hoi, S. C. H., and Jin, R. (2014), “Online feature selection and its applications,” IEEE Transactions on Knowledge and Data Engineering, 26, 698–710.
  34. Wu, Y., Hoi, S. C., Mei, T., and Yu, N. (2017), “Large-scale online feature selection for ultra-high dimensional sparse data,” ACM Transactions on Knowledge Discovery from Data (TKDD), 11, 48.
  35. Xiao, L. (2010), “Dual averaging methods for regularized stochastic learning and online optimization,” Journal of Machine Learning Research, 11, 2543–2596.
  36. Yalniz, I. Z., Jégou, H., Chen, K., Paluri, M., and Mahajan, D. (2019), “Billion-scale semi-supervised learning for image classification,” arXiv preprint arXiv:1905.00546.
  37. Yang, H., Fujimaki, R., Kusumura, Y., and Liu, J. (2016), “Online feature selection: A limited-memory substitution algorithm and its asynchronous parallel variation,” in SIGKDD, ACM, pp. 1945–1954.
  38. Yu, M. and Chen, X. (2017), “Finite sample change point inference and identification for high‐dimensional mean vectors,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83, 247–270.
  39. Yuan, X., Li, P., and Zhang, T. (2014), “Gradient hard thresholding pursuit for sparsity-constrained optimization,” in ICML, pp. 127–135.
  40. Zhang, C.-H. (2010), “Nearly unbiased variable selection under minimax concave penalty,” Annals of Statistics, 894–942.
  41. Zinkevich, M. (2003), “Online convex programming and generalized infinitesimal gradient ascent,” in ICML, pp. 928–936.
  42. Zou, H. and Hastie, T. (2005), “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 301–320.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube