Dice Question Streamline Icon: https://streamlinehq.com

Tightness of high‑probability nonconvex SGD rate

Determine whether the best‑known high‑probability convergence guarantee for stochastic gradient descent on L‑smooth, possibly nonconvex functions—bounding the average squared gradient norm by a rate of order sqrt(L·(f(x0)−f*)·R^2/T + L·(f(x0)−f*)/T) plus an additive R^2·log(1/δ)/T term (as in Liu et al., 2023)—is tight, i.e., establish matching lower bounds or prove that the bound cannot be improved in general.

Information Square Streamline Icon: https://streamlinehq.com

Background

In the nonconvex setting, the paper reviews the best‑known high‑probability bound for SGD (citing Liu et al., 2023) that controls the average squared gradient norm. While the expected‑value rate is known to be tight, the authors note that the tightness of the high‑probability rate is unknown.

This uncertainty is central because the paper provides a tuning‑free variant of SGD that matches this high‑probability rate up to polylogarithmic factors. Establishing tightness would clarify whether the obtained guarantees are optimal or leave room for improvement.

References

"This rate is known to be tight for convergence in expectation~\citep{arjevani19_lower_bound_non_convex_stoch_optim}. However, it is not known if it is tight for returning a high probability guarantee."

Tuning-Free Stochastic Optimization (2402.07793 - Khaled et al., 12 Feb 2024) in Section 6. Nonconvex Tuning-Free Optimization (paragraph preceding Eq. (40))