Tightness of high‑probability nonconvex SGD rate
Determine whether the best‑known high‑probability convergence guarantee for stochastic gradient descent on L‑smooth, possibly nonconvex functions—bounding the average squared gradient norm by a rate of order sqrt(L·(f(x0)−f*)·R^2/T + L·(f(x0)−f*)/T) plus an additive R^2·log(1/δ)/T term (as in Liu et al., 2023)—is tight, i.e., establish matching lower bounds or prove that the bound cannot be improved in general.
References
"This rate is known to be tight for convergence in expectation~\citep{arjevani19_lower_bound_non_convex_stoch_optim}. However, it is not known if it is tight for returning a high probability guarantee."
— Tuning-Free Stochastic Optimization
(2402.07793 - Khaled et al., 12 Feb 2024) in Section 6. Nonconvex Tuning-Free Optimization (paragraph preceding Eq. (40))