Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Notes on Worst-case Inefficiency of Gradient Descent Even in R^2 (2008.07513v2)

Published 17 Aug 2020 in cs.LG and stat.ML

Abstract: Gradient descent is a popular algorithm in optimization, and its performance in convex settings is mostly well understood. In non-convex settings, it has been shown that gradient descent is able to escape saddle points asymptotically and converge to local minimizers [Lee et. al. 2016]. Recent studies also show a perturbed version of gradient descent is enough to escape saddle points efficiently [Jin et. al. 2015, Ge et. al. 2017]. In this paper we show a negative result: gradient descent may take exponential time to escape saddle points, with non-pathological two dimensional functions. While our focus is theoretical, we also conduct experiments verifying our theoretical result. Through our analysis we demonstrate that stochasticity is essential to escape saddle points efficiently.

Summary

We haven't generated a summary for this paper yet.