Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes (2102.06356v3)

Published 12 Feb 2021 in cs.LG and stat.ML

Abstract: Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether LARS and LAMB have any benefit over traditional, generic algorithms. In this work we demonstrate that standard optimization algorithms such as Nesterov momentum and Adam can match or exceed the results of LARS and LAMB at large batch sizes. Our results establish new, stronger baselines for future comparisons at these batch sizes and shed light on the difficulties of comparing optimizers for neural network training more generally.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zachary Nado (23 papers)
  2. Justin M. Gilmer (1 paper)
  3. Christopher J. Shallue (16 papers)
  4. Rohan Anil (32 papers)
  5. George E. Dahl (27 papers)
Citations (27)