Papers
Topics
Authors
Recent
2000 character limit reached

Toward a Unified Theory of Gradient Descent under Generalized Smoothness (2412.11773v2)

Published 16 Dec 2024 in math.OC

Abstract: We study the classical optimization problem $\min_{x \in \mathbb{R}d} f(x)$ and analyze the gradient descent (GD) method in both nonconvex and convex settings. It is well-known that, under the $L$-smoothness assumption ($|\nabla2 f(x)| \leq L$), the optimal point minimizing the quadratic upper bound $f(x_k) + \langle\nabla f(x_k), x_{k+1} - x_k\rangle + \frac{L}{2} |x_{k+1} - x_k|2$ is $x_{k+1} = x_k - \gamma_k \nabla f(x_k)$ with step size $\gamma_k = \frac{1}{L}$. Surprisingly, a similar result can be derived under the $\ell$-generalized smoothness assumption ($|\nabla2 f(x)| \leq \ell(|\nabla f(x)|)$). In this case, we derive the step size $$\gamma_k = \int_{0}{1} \frac{d v}{\ell(|\nabla f(x_k)| + |\nabla f(x_k)| v)}.$$ Using this step size rule, we improve upon existing theoretical convergence rates and obtain new results in several previously unexplored setups.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.