Study of the κ hyperparameter in Sven

Investigate the influence of the hyperparameter κ>0 in the generalized decomposition L(θ) = Σα ((ℓα(θ))κ/2)2/κ used by Sven, specifically analyzing how defining effective residuals ℛeffα = (ℓα(θ))κ/2 and the corresponding Jacobian M affect the pseudoinverse-based update, convergence behavior, and performance across regimes and losses.

Background

Sven generalizes the treatment of per-sample losses by introducing a κ-dependent residual definition ℛeffα = (ℓα(θ))κ/2, which reduces to treating √ℓα as an L2 residual when κ=1 and uses ℓα itself when κ=2.

The authors default to κ=2 in practice to avoid pathologies from fractional powers for losses like cross-entropy and note that in the under-parameterized limit κ only rescales the update, while in the over-parameterized regime κ changes the Jacobian structure non-trivially.

They explicitly defer a detailed investigation of how κ shapes the optimizer’s behavior, leaving its systematic study as future work.

References

We leave a detailed study of the κ hyperparameter to future work.

— Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method (2604.01279 - Bright-Thonney et al., 1 Apr 2026) in Section 2 (Methodology), discussion following the κ-generalization of the loss decomposition

Study of the κ hyperparameter in Sven

Background

References

Related Problems