Characterize covariance structure favoring GN^{-1/2} over GN^{-1} (via condition number ratio)

Characterize the set of covariance matrices Cov_x for which r(Cov_x) > 1, where r(Cov_x) := cond(Cov_x^{1/2} diag(Cov_x^{-1}) Cov_x^{1/2}) / cond(Cov_x^{1/2} diag(Cov_x^{-1/2}) Cov_x^{1/2}). This will identify when, under the identity basis for diagonal preconditioning, using the Gauss–Newton diagonal with power −1/2 yields a more favorable preconditioned-Hessian condition number than using power −1.

Background

The paper studies diagonal preconditioners derived from Adam and Gauss–Newton (GN), analyzing how basis choice and gradient noise affect convergence. A key comparison concerns which GN power (−1 vs −1/2) is preferable when the diagonal is taken in the identity basis.

They show that the convergence rate of preconditioned gradient descent depends on the condition number of the preconditioned Hessian and define the ratio r(Cov_x) to compare the condition numbers induced by GN powers −1 and −1/2. They provide examples where r(Cov_x) > 1 (favoring GN{−1/2}) and others where r(Cov_x) < 1 (favoring GN{−1}).

While empirical and constructive examples are given for both cases, a full characterization of the covariance matrices for which r(Cov_x) > 1 remains unresolved, and the authors explicitly leave this characterization to future work.

References

Characterizing covariance matrices for which $r(\Cov_x) > 1$ is left as future work.

Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise (2510.13680 - Liu et al., 15 Oct 2025) in Appendix A.3 (Comparing GN powers)