Adaptivity to interpolation for last-iterate SGD without knowing σ_*^2

Develop a Stochastic Gradient Descent step-size or scheduling rule that achieves last-iterate bounds adaptively with respect to interpolation—attaining O(D^2/T) when the solution gradient variance σ_*^2=0 and O(ln(T)/√T) when σ_*^2>0—without requiring prior knowledge of σ_*^2 or other problem-specific interpolation constants.

Background

The paper highlights an ‘ideal’ adaptive bound that switches between O(1/T) under interpolation and O(ln(T)/√T) otherwise, but notes that achieving such adaptivity without knowledge of σ_*² is currently unknown for SGD. Solving this would yield algorithms that automatically exploit interpolation when present while maintaining optimal rates otherwise.

References

As far as we know, there is no known result for SGD which is able to achieve e:interpolation ideal bound while at the same time being adaptive to interpolation.

— Last-Iterate Complexity of SGD for Convex and Smooth Stochastic Problems (2507.14122 - Garrigos et al., 18 Jul 2025) in Remark “About (non-)adaptivity to interpolation,” Section 3 (Main results)

Adaptivity to interpolation for last-iterate SGD without knowing σ_*^2

Background

References

Related Problems