Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization (2110.09057v1)

Published 18 Oct 2021 in cs.LG and math.OC

Abstract: Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning. Moreover, the calibrated fixed hyperparameter may not lead to optimal performance. In this paper, to eliminate the effort for tuning the momentum-related hyperparameter, we propose a new adaptive momentum inspired by the optimal choice of the heavy ball momentum for quadratic optimization. Our proposed adaptive heavy ball momentum can improve stochastic gradient descent (SGD) and Adam. SGD and Adam with the newly designed adaptive momentum are more robust to large learning rates, converge faster, and generalize better than the baselines. We verify the efficiency of SGD and Adam with the new adaptive momentum on extensive machine learning benchmarks, including image classification, LLMing, and machine translation. Finally, we provide convergence guarantees for SGD and Adam with the proposed adaptive momentum.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tao Sun (143 papers)
  2. Huaming Ling (5 papers)
  3. Zuoqiang Shi (75 papers)
  4. Dongsheng Li (240 papers)
  5. Bao Wang (70 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.