Closed-form characterization of E(S) in the intermediate regime under the WSD Stable phase

Derive a tractable closed-form expression for the data consumption function E(S)—the total number of tokens required to reach a fixed target loss as a function of optimization steps S—in the intermediate interval S_min < S < +∞ during large-scale pre-training under the Stable phase (constant learning rate) of the Warmup-Stable-Decay learning rate schedule, complementing the known asymptotic behaviors at S → S_min and S → +∞.

Background

The paper analyzes training dynamics under the Warmup-Stable-Decay learning rate schedule and shows that the classical Critical Batch Size E(S) relationship does not hold in the Stable phase. Through asymptotic analysis, the authors characterize the behavior of E(S) near S → S_min and S → +∞, establishing inverse-linear and linear forms respectively.

However, the authors explicitly state that the functional form of E(S) within the intermediate regime is unknown. To proceed, they approximate E(S) in this region by a quadratic segment within a piecewise function, enabling empirical fitting and the introduction of B_min and B_opt. A closed-form characterization of E(S) across the entire intermediate interval would strengthen the theoretical foundation and potentially improve scheduling strategies.

References

What remains an open question is the variation of $E(S)$ when $S$ falls within the intermediate interval.

— How to Set the Batch Size for Large-Scale Pre-training? (2601.05034 - Zhou et al., 8 Jan 2026) in Appendix, Subsection 'Reconstruction of E(S)'

Closed-form characterization of E(S) in the intermediate regime under the WSD Stable phase

Sponsor

Background

References

Related Problems