Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 144 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 124 tok/s Pro
Kimi K2 210 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise (2208.08567v2)

Published 17 Aug 2022 in math.OC and stat.ML

Abstract: In this work we study high probability bounds for stochastic subgradient methods under heavy tailed noise. In this setting the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoys high probability bounds. We analyzed a clipped version of the projected stochastic subgradient method, where subgradient estimates are truncated whenever they have large norms. We show that this clipping strategy leads both to near optimal any-time and finite horizon bounds for many classical averaging schemes. Preliminary experiments are shown to support the validity of the method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
  2. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331–368, 2007.
  3. Ashok Cutkosky. Anytime online-to-batch, optimism and acceleration. In Proceedings of the 36th International Conference on Machine Learning, pages 1446–1454, 2019.
  4. From low probability to high confidence in stochastic convex optimization. Journal of Machine Learning Research, 22(49), 2021.
  5. Nonparametric stochastic approximation with large step-sizes. The Annals of Stastics, 44, 2016.
  6. Yu. M. Ermol’ev. On the method of generalized stochastic gradients and quasi-Féjer sequences. Cybernetics, 5:208–220, 1969.
  7. David A Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100–118, 1975.
  8. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, ii: shrinking procedures and optimal algorithms. SIAM Journal on Optimization, 23(4):2061–2089, 2013.
  9. Stochastic optimization with heavy-tailed noise via accelerated gradient clipping. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, pages 15042–15053, 2020.
  10. Near-optimal high probability complexity bounds for non-smooth stochastic optimization with heavy-tailed noise. arXiv preprint arXiv:2106.05958, 2021.
  11. Tight analyses for non-smooth stochastic gradient descent. In Proceedinds of the 32nd International Conference on Computational Learning Theory, pages 1579–1613, 2019.
  12. Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent. arXiv preprint arXiv:1909.00843, 2019.
  13. Matthew J. Holland. Anytime guarantees under heavy-tailed data. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, pages 6918–6925, 2022.
  14. Making the last iterate of sgd information theoretically optimal. In Proceedinds of the 32nd International Conference on Computational Learning Theory, pages 1752–1755, 2019.
  15. Guanghui Lan. First-order and stochastic optimization methods for machine learning, volume 1. Springer, 2020.
  16. Generalization properties and implicit regularization for multiple passes sgm. In International Conference on Machine Learning, pages 2340–2348. PMLR, 2016.
  17. Optimal learning for multi-pass stochastic gradient methods. Advances in Neural Information Processing Systems, 29, 2016.
  18. Algorithms of robust stochastic optimization based on mirror descent method. Automation and Remote Control, 80(9):1607–1627, 2019.
  19. Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983.
  20. Boris T. Polyak. A general method for solving extremal problems. Soviet Mathematics Doklady, 8:593–597, 1967.
  21. Ef21: A new, simpler, theoretically better, and practically faster error feedback. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, 2021.
  22. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In Proceedings of the 30th International Conference on Machine Learning, 2013.
  23. Naum Zuselevich Shor. Minimization Methods for Non-differentiable Functions. Springer, 1985.
  24. Support vector machines. Springer Science & Business Media, 2008.
  25. Optimal rates for regularized least squares regression. In Proceedings of the 22nd Conference on Learning Theory, 2009.
  26. Polyak Boris T. Subgradient methods: A survey of Soviet research. Nonsmooth optimization, 3:5–29, 1978.
  27. Why are adaptive methods good for attention models? In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, pages 15383–15393, 2020.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper: