Desirability of relaxing rtol for cross-entropy objectives
Ascertain whether lowering the rtol threshold to include more singular values in the truncated SVD for cross-entropy training with Sven yields desirable outcomes, or whether it primarily increases overfitting without validation gains.
References
It is not clear that this would be desirable, as the other optimizers simply appear to overfit more, achieving lower training losses without an attendant improvement in validation loss.
— Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method
(2604.01279 - Bright-Thonney et al., 1 Apr 2026) in Appendix: Classification with Cross-Entropy