Dice Question Streamline Icon: https://streamlinehq.com

Adaptive learning rates beyond Euclidean spaces for geometry-aware optimizers

Develop layerwise adaptive learning rate schemes for geometry-aware optimization algorithms operating in non-Euclidean normed spaces that enable these optimizers to exploit heterogeneous, time-varying gradient noise across layers during deep neural network training.

Information Square Streamline Icon: https://streamlinehq.com

Background

Recent geometry-aware optimizers (e.g., Muon and Scion) exploit structural properties of neural network layers by operating under operator norms and linear minimization oracles, but they typically assign fixed learning rates within groups of layers sharing the same norm or shape.

Empirical evidence shows that gradient noise and local curvature are heterogeneous across layers and evolve over training, suggesting that uniform, fixed rates may be inefficient. While layerwise adaptive methods exist in Euclidean settings (e.g., AdamW variants, LAMB), they do not account for the non-Euclidean geometries leveraged by geometry-aware methods.

The paper frames the need to move beyond Euclidean adaptivity: to design adaptive learning rate mechanisms compatible with geometry-aware, non-Euclidean optimization so that heterogeneous gradient noise across layers and time is appropriately handled.

References

The key open question is how to design adaptive learning rates beyond standard Euclidean spaces, enabling geometry-aware optimizers to exploit heterogeneous gradient noise across layers and over the course of training (as illustrated in Figure~\ref{fig:noise_heterogeneity}).