Closed-form characterization of E(S) in the intermediate regime under the WSD Stable phase
Derive a tractable closed-form expression for the data consumption function E(S)—the total number of tokens required to reach a fixed target loss as a function of optimization steps S—in the intermediate interval S_min < S < +∞ during large-scale pre-training under the Stable phase (constant learning rate) of the Warmup-Stable-Decay learning rate schedule, complementing the known asymptotic behaviors at S → S_min and S → +∞.
Sponsor
References
What remains an open question is the variation of $E(S)$ when $S$ falls within the intermediate interval.
— How to Set the Batch Size for Large-Scale Pre-training?
(2601.05034 - Zhou et al., 8 Jan 2026) in Appendix, Subsection 'Reconstruction of E(S)'