Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Published 15 Jul 2025 in cs.LG, math.OC, and stat.ML | (2507.11274v1)

Abstract: We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting -- particularly with large (constant) stepsizes -- has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after $T$ steps of SGD on $\beta$-smooth convex loss functions with stepsize $\eta \leq 1/\beta$, the last iterate exhibits expected excess risk $\widetilde{O}(1/(\eta T{1-\beta\eta/2}) + \eta T{\beta\eta/2} \sigma_\star2)$, where $\sigma_\star2$ denotes the variance of the stochastic gradients at the optimum. In particular, for a well-tuned stepsize we obtain a near optimal $\widetilde{O}(1/T + \sigma_\star/\sqrt{T})$ rate for the last iterate, extending the results of Varre et al. (2021) beyond least squares regression; and when $\sigma_\star=0$ we obtain a rate of $O(1/\sqrt{T})$ with $\eta=1/\beta$, improving upon the best-known $O(T{-1/4})$ rate recently established by Evron et al. (2025) in the special case of realizable linear regression.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.