Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Linear Convergence Rate in Convex Setup is Possible! Gradient Descent Method Variants under $(L_0,L_1)$-Smoothness (2412.17050v2)

Published 22 Dec 2024 in math.OC

Abstract: The gradient descent (GD) method -- is a fundamental and likely the most popular optimization algorithm in ML, with a history traced back to a paper in 1847 (Cauchy, 1847). It was studied under various assumptions, including so-called $(L_0,L_1)$-smoothness, which received noticeable attention in the ML community recently. In this paper, we provide a refined convergence analysis of gradient descent and its variants, assuming generalized smoothness. In particular, we show that $(L_0,L_1)$-GD has the following behavior in the convex setup: as long as $|\nabla f(xk)| \geq \frac{L_0}{L_1}$ the algorithm has linear convergence in function suboptimality, and when $|\nabla f(xk)| < \frac{L_0}{L_1}$ is satisfied, $(L_0,L_1)$-GD has standard sublinear rate. Moreover, we also show that this behavior is common for its variants with different types of oracle: Normalized Gradient Descent as well as Clipped Gradient Descent (the case when the full gradient $\nabla f(x)$ is available); Random Coordinate Descent (when the gradient component $\nabla_{i} f(x)$ is available); Random Coordinate Descent with Order Oracle (when only $\text{sign} [f(y) - f(x)]$ is available). In addition, we also extend our analysis of $(L_0,L_1)$-GD to the strongly convex case.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com