(Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum (2401.06738v2)

Published 12 Jan 2024 in math.OC, cs.LG, and stat.ML

Abstract: Stochastic heavy ball momentum (SHB) is commonly used to train machine learning models, and often provides empirical improvements over stochastic gradient descent. By primarily focusing on strongly-convex quadratics, we aim to better understand the theoretical advantage of SHB and subsequently improve the method. For strongly-convex quadratics, Kidambi et al. (2018) show that SHB (with a mini-batch of size $1$) cannot attain accelerated convergence, and hence has no theoretical benefit over SGD. They conjecture that the practical gain of SHB is a by-product of using larger mini-batches. We first substantiate this claim by showing that SHB can attain an accelerated rate when the mini-batch size is larger than a threshold $b^*$ that depends on the condition number $\kappa$. Specifically, we prove that with the same step-size and momentum parameters as in the deterministic setting, SHB with a sufficiently large mini-batch size results in an $O\left(\exp(-\frac{T}{\sqrt{\kappa}}) + \sigma \right)$ convergence, where $T$ is the number of iterations and $\sigma^2$ is the variance in the stochastic gradients. We prove a lower-bound which demonstrates that a $\kappa$ dependence in $b^*$ is necessary. To ensure convergence to the minimizer, we design a noise-adaptive multi-stage algorithm that results in an $O\left(\exp\left(-\frac{T}{\sqrt{\kappa}}\right) + \frac{\sigma}{T}\right)$ rate. We also consider the general smooth, strongly-convex setting and propose the first noise-adaptive SHB variant that converges to the minimizer at an $O(\exp(-\frac{T}{\kappa}) + \frac{\sigma^2}{T})$ rate. We empirically demonstrate the effectiveness of the proposed algorithms.

Authors (3)

Anh Dang (3 papers)
Reza Babanezhad (18 papers)
Sharan Vaswani (35 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/StatMLPapers/status/1746721025394450481

https://twitter.com/verif_papers/status/1746741255638548915

(Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum (2401.06738v2)

Summary

Related Papers

Tweets