Alternative functional forms for problem-constant scaling

Investigate alternative functional forms for modeling the dependence of the smoothness constant L, the KL-condition constant μ, and the norm-equivalence constant ρ on the number of layers, embedding dimension, and batch size beyond the shifted power-law fits used in the paper, and determine which forms better capture their empirical behavior.

Background

To apply their theory, the authors empirically estimate how problem-dependent constants L, μ, and ρ scale with architectural parameters and batch size, fitting shifted power-law relationships. These fits are then used to guide hyperparameter transfer across scales.

They acknowledge that the choice of functional form is flexible and that other dependencies may better capture observed behavior, explicitly leaving the exploration of alternatives for future work.

References

We leave the exploration of other functional dependencies to future work.

— On the Role of Batch Size in Stochastic Conditional Gradient Methods (2603.21191 - Islamov et al., 22 Mar 2026) in Section 6.5 (Estimating Problem-Dependent Constants)

Alternative functional forms for problem-constant scaling

Background

References

Related Problems