Parameter-free attainment of optimal heavy-tailed sample complexity

Determine whether there exists any algorithm that, without knowledge of problem-dependent parameters (including the smoothness constant L, the initial optimality gap Δ1, the noise scale σ, and the tail index p), achieves the optimal heavy-tailed sample complexity for finding an ε-stationary point in nonconvex stochastic optimization under the p-BCM model; namely, match the parameter-dependent lower bound Ω(Δ1 L ε^{-2} + (Δ1 L ε^{-2})(σ/ε)^{p/(p−1)}).

Background

The paper establishes that when problem-dependent parameters are known, minibatch Normalized SGD can be tuned to achieve the optimal minimax sample complexity under p-BCM noise, matching the lower bound across all parameters.

They also provide a parameter-free guarantee for NSGD, but this bound does not match the optimal parameter-dependent rate. Consequently, it remains unresolved whether any algorithm can achieve the same optimal heavy-tailed complexity without access to problem-dependent parameters, which would close the gap between parameter-dependent and parameter-free performances.

References

More importantly, it remains open whether the sample complexity that is optimal for parameter-dependent algorithms eq:pbcm_lower_bound can be achieved by any algorithm without knowledge of problem parameters.

— From Gradient Clipping to Normalization for Heavy Tailed SGD (2410.13849 - Hübler et al., 17 Oct 2024) in Section 6 (Conclusion)

Parameter-free attainment of optimal heavy-tailed sample complexity

Background

References

Related Problems