Sufficiency of eigenvalue-corrected Shampoo (EShampoo) under μP
Investigate whether eigenvalue-corrected Shampoo that eliminates grafting remains sufficient under Maximal Update Parameterization (μP), where per-layer learning-rate scaling follows width-dependent initialization rules, and determine the compatibility conditions between μP scaling and EShampoo’s preconditioner corrections.
Sponsor
References
Whether such corrections remain sufficient under μP, where per-layer learning rate scaling is already governed by width-dependent initialization rules, remains as an open question.
— Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale
(2512.18373 - Nagwekar, 20 Dec 2025) in Subsection “Interplay with μP and Optimizer Choice” within Section “Learning Rate Schedules”