Generalization of preliminary training-dynamics analysis to long training runs
Determine whether the preliminary analysis of training dynamics for reinforcement learning post-training of large language models—covering the effects of total batch size, the decomposition into number of prompts versus generations per prompt, and the efficacy of focusing on intermediate-difficulty prompts with success probability pπ(x) ≈ 0.5—generalizes to substantially longer training runs.
References
Although we observe strong early-stage performance, it remains an open question whether our analysis in Section~\ref{sec:preliminary_investigation} would generalize to much longer training runs.
— Prompt Curriculum Learning for Efficient LLM Post-Training
(2510.01135 - Gao et al., 1 Oct 2025) in Limitations, subsection "Limited training horizon"