Desirability of relaxing rtol for cross-entropy objectives

Ascertain whether lowering the rtol threshold to include more singular values in the truncated SVD for cross-entropy training with Sven yields desirable outcomes, or whether it primarily increases overfitting without validation gains.

Background

In MNIST experiments using cross-entropy, the singular value spectrum becomes sharply hierarchical after early epochs, which may cause Sven to neglect directions under typical rtol settings.

The authors note that reducing rtol could reintroduce these directions and remove the training-loss discrepancy with baselines, but they question whether this is beneficial given the risk of overfitting.

They explicitly state uncertainty regarding the desirability of such a change.

References

It is not clear that this would be desirable, as the other optimizers simply appear to overfit more, achieving lower training losses without an attendant improvement in validation loss.

Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method  (2604.01279 - Bright-Thonney et al., 1 Apr 2026) in Appendix: Classification with Cross-Entropy