Understanding when additional singular values improve performance

Determine why retaining additional singular values in the truncated SVD of the per-sample loss Jacobian in Sven improves optimization in some tasks but not others, characterizing the task- and landscape-dependent conditions under which larger k is beneficial versus detrimental.

Background

The authors observe that Sven’s performance often improves as the retained rank k increases, sometimes saturating around k≈B/2, but this pattern varies across datasets.

For 1D regression, too small k leads to failures to learn, whereas for random polynomial regression more aggressive truncation can enhance performance.

They hypothesize that these effects relate to differences in singular value spectra across tasks but explicitly note that the underlying reason is not clear.

References

It is not immediately clear why additional singular values are beneficial in some cases and not others, but it is likely related to the overall loss landscape of the problem.

Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method  (2604.01279 - Bright-Thonney et al., 1 Apr 2026) in Section 4 (Experiments), Regression Results