Papers
Topics
Authors
Recent
2000 character limit reached

On the Distributional Properties of Adaptive Gradients (2105.07222v1)

Published 15 May 2021 in cs.LG and stat.ML

Abstract: Adaptive gradient methods have achieved remarkable success in training deep neural networks on a wide variety of tasks. However, not much is known about the mathematical and statistical properties of this family of methods. This work aims at providing a series of theoretical analyses of its statistical properties justified by experiments. In particular, we show that when the underlying gradient obeys a normal distribution, the variance of the magnitude of the \textit{update} is an increasing and bounded function of time and does not diverge. This work suggests that the divergence of variance is not the cause of the need for warm up of the Adam optimizer, contrary to what is believed in the current literature.

Citations (4)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.