High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise (2208.08567v2)
Abstract: In this work we study high probability bounds for stochastic subgradient methods under heavy tailed noise. In this setting the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoys high probability bounds. We analyzed a clipped version of the projected stochastic subgradient method, where subgradient estimates are truncated whenever they have large norms. We show that this clipping strategy leads both to near optimal any-time and finite horizon bounds for many classical averaging schemes. Preliminary experiments are shown to support the validity of the method.
- Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
- Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331–368, 2007.
- Ashok Cutkosky. Anytime online-to-batch, optimism and acceleration. In Proceedings of the 36th International Conference on Machine Learning, pages 1446–1454, 2019.
- From low probability to high confidence in stochastic convex optimization. Journal of Machine Learning Research, 22(49), 2021.
- Nonparametric stochastic approximation with large step-sizes. The Annals of Stastics, 44, 2016.
- Yu. M. Ermol’ev. On the method of generalized stochastic gradients and quasi-Féjer sequences. Cybernetics, 5:208–220, 1969.
- David A Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100–118, 1975.
- Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, ii: shrinking procedures and optimal algorithms. SIAM Journal on Optimization, 23(4):2061–2089, 2013.
- Stochastic optimization with heavy-tailed noise via accelerated gradient clipping. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, pages 15042–15053, 2020.
- Near-optimal high probability complexity bounds for non-smooth stochastic optimization with heavy-tailed noise. arXiv preprint arXiv:2106.05958, 2021.
- Tight analyses for non-smooth stochastic gradient descent. In Proceedinds of the 32nd International Conference on Computational Learning Theory, pages 1579–1613, 2019.
- Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent. arXiv preprint arXiv:1909.00843, 2019.
- Matthew J. Holland. Anytime guarantees under heavy-tailed data. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, pages 6918–6925, 2022.
- Making the last iterate of sgd information theoretically optimal. In Proceedinds of the 32nd International Conference on Computational Learning Theory, pages 1752–1755, 2019.
- Guanghui Lan. First-order and stochastic optimization methods for machine learning, volume 1. Springer, 2020.
- Generalization properties and implicit regularization for multiple passes sgm. In International Conference on Machine Learning, pages 2340–2348. PMLR, 2016.
- Optimal learning for multi-pass stochastic gradient methods. Advances in Neural Information Processing Systems, 29, 2016.
- Algorithms of robust stochastic optimization based on mirror descent method. Automation and Remote Control, 80(9):1607–1627, 2019.
- Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983.
- Boris T. Polyak. A general method for solving extremal problems. Soviet Mathematics Doklady, 8:593–597, 1967.
- Ef21: A new, simpler, theoretically better, and practically faster error feedback. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, 2021.
- Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In Proceedings of the 30th International Conference on Machine Learning, 2013.
- Naum Zuselevich Shor. Minimization Methods for Non-differentiable Functions. Springer, 1985.
- Support vector machines. Springer Science & Business Media, 2008.
- Optimal rates for regularized least squares regression. In Proceedings of the 22nd Conference on Learning Theory, 2009.
- Polyak Boris T. Subgradient methods: A survey of Soviet research. Nonsmooth optimization, 3:5–29, 1978.
- Why are adaptive methods good for attention models? In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, pages 15383–15393, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.