Papers
Topics
Authors
Recent
Search
2000 character limit reached

Beating SGD Saturation with Tail-Averaging and Minibatching

Published 22 Feb 2019 in stat.ML and cs.LG | (1902.08668v2)

Abstract: While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, and in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning errors, hence providing practical insights. In particular, we show for the first time in the literature that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Finally, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.

Citations (35)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.