Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization (2006.07065v1)

Published 12 Jun 2020 in math.OC and cs.LG

Abstract: Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of optimization and machine learning communities, both for the leverage of life-long information and for the profound and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. Along this line, we revisited existing adaptive gradient methods from a novel perspective, refreshing understanding of second moments. Our new perspective empowers us to attach the properties of second moments to the first moment iteration, and to propose a novel first moment optimizer, \emph{Angle-Calibrated Moment method} (\method). Our theoretical results show that \method is able to achieve the same convergence rate as mainstream adaptive methods. Furthermore, extensive experiments on CV and NLP tasks demonstrate that \method has a comparable convergence to SOTA Adam-type optimizers, and gains a better generalization performance in most cases.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xunpeng Huang (14 papers)
  2. Runxin Xu (30 papers)
  3. Hao Zhou (351 papers)
  4. Zhe Wang (574 papers)
  5. Zhengyang Liu (24 papers)
  6. Lei Li (1293 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.