EMA in the modular norm framework
Characterize the theoretical role of exponential moving averages within the modular norm optimization framework by deriving how EMA-based first and second moment accumulation interacts with layer-wise duality maps and induced operator norms, and establish principled guidance for choosing EMA parameters in norm-based optimizers.
Sponsor
References
Bernstein and Newhouse acknowledge this gap in their “Norm Anthology” paper, noting that understanding EMA's role in the framework is “perhaps still an open problem”.
— Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale
(2512.18373 - Nagwekar, 20 Dec 2025) in Subsubsection “Missing Theory on EMA” within Section “Limitations of the Modular Norm Framework”