Optimal allocation of prompts versus generations per prompt at very large batch sizes

Determine, at very large global batch sizes (e.g., 2k+ completions per step), whether allocating more prompts versus more generations per prompt yields superior asymptotic performance and compute efficiency. Construct a principled rule for this allocation under a fixed total batch.

References

For a fixed total batch, is it better to allocate more prompts or more generations per prompt? Sweeping generations per prompt {8,16,24,32} and adjusting prompts to keep total batch fixed leaves fitted scaling curves essentially unchanged (Appendix~\ref{appendix:large_scale}), suggesting that, at moderate batch, this allocation is a second-order choice for both A and B. Clearer differences may emerge at much larger batches (e.g., 2k+), which we leave for future work.

— The Art of Scaling Reinforcement Learning Compute for LLMs (Khatri et al., 15 Oct 2025) in Section 4 (Predictable Scaling Returns Across RL Compute Axes) — Generations per prompt (fixed total batch)

Optimal allocation of prompts versus generations per prompt at very large batch sizes

References

Related Problems