Alternative functional forms for problem-constant scaling
Investigate alternative functional forms for modeling the dependence of the smoothness constant L, the KL-condition constant μ, and the norm-equivalence constant ρ on the number of layers, embedding dimension, and batch size beyond the shifted power-law fits used in the paper, and determine which forms better capture their empirical behavior.
References
We leave the exploration of other functional dependencies to future work.
— On the Role of Batch Size in Stochastic Conditional Gradient Methods
(2603.21191 - Islamov et al., 22 Mar 2026) in Section 6.5 (Estimating Problem-Dependent Constants)