Parameter-Agnostic Optimization under Relaxed Smoothness (2311.03252v1)

Published 6 Nov 2023 in math.OC, cs.LG, and stat.ML

Abstract: Tuning hyperparameters, such as the stepsize, presents a major challenge of training machine learning models. To address this challenge, numerous adaptive optimization algorithms have been developed that achieve near-optimal complexities, even when stepsizes are independent of problem-specific parameters, provided that the loss function is $L$-smooth. However, as the assumption is relaxed to the more realistic $(L_0, L_1)$-smoothness, all existing convergence results still necessitate tuning of the stepsize. In this study, we demonstrate that Normalized Stochastic Gradient Descent with Momentum (NSGD-M) can achieve a (nearly) rate-optimal complexity without prior knowledge of any problem parameter, though this comes at the cost of introducing an exponential term dependent on $L_1$ in the complexity. We further establish that this exponential term is inevitable to such schemes by introducing a theoretical framework of lower bounds tailored explicitly for parameter-agnostic algorithms. Interestingly, in deterministic settings, the exponential factor can be neutralized by employing Gradient Descent with a Backtracking Line Search. To the best of our knowledge, these findings represent the first parameter-agnostic convergence results under the generalized smoothness condition. Our empirical experiments further confirm our theoretical insights.

Authors (4)

Florian Hübler (2 papers)
Junchi Yang (11 papers)
Xiang Li (1003 papers)
Niao He (91 papers)

Citations (8)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Parameter-Agnostic Optimization under Relaxed Smoothness (2311.03252v1)

Summary

Related Papers