Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization (1808.05671v4)

Published 16 Aug 2018 in cs.LG, math.OC, and stat.ML

Abstract: Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad. For smooth nonconvex functions, we prove that adaptive gradient methods in expectation converge to a first-order stationary point. Our convergence rate is better than existing results for adaptive gradient methods in terms of dimension. In addition, we also prove high probability bounds on the convergence rates of AMSGrad, RMSProp as well as AdaGrad, which have not been established before. Our analyses shed light on better understanding the mechanism behind adaptive gradient methods in optimizing nonconvex objectives.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Dongruo Zhou (51 papers)
  2. Jinghui Chen (50 papers)
  3. Yuan Cao (201 papers)
  4. Ziyan Yang (15 papers)
  5. Quanquan Gu (198 papers)
Citations (150)

Summary

We haven't generated a summary for this paper yet.