Underlying reason for variance adaptation’s effectiveness in deep networks
Determine the underlying reason for the effectiveness of variance adaptation—elementwise normalization of parameter updates by an exponential moving average of the uncentered squared gradients—in training deep neural networks.
References
We believe a fruitful open question is to identify the underlying reason behind variance-adaptation's effectiveness in deep networks.
— What Really Matters in Matrix-Whitening Optimizers?
(2510.25000 - Frans et al., 28 Oct 2025) in Discussion and Conclusion, Future work