Parameter-free attainment of optimal heavy-tailed sample complexity
Determine whether there exists any algorithm that, without knowledge of problem-dependent parameters (including the smoothness constant L, the initial optimality gap Δ1, the noise scale σ, and the tail index p), achieves the optimal heavy-tailed sample complexity for finding an ε-stationary point in nonconvex stochastic optimization under the p-BCM model; namely, match the parameter-dependent lower bound Ω(Δ1 L ε^{-2} + (Δ1 L ε^{-2})(σ/ε)^{p/(p−1)}).
References
More importantly, it remains open whether the sample complexity that is optimal for parameter-dependent algorithms eq:pbcm_lower_bound can be achieved by any algorithm without knowledge of problem parameters.
— From Gradient Clipping to Normalization for Heavy Tailed SGD
(2410.13849 - Hübler et al., 17 Oct 2024) in Section 6 (Conclusion)