Scale dependence of algorithmic progress parameters
Investigate whether algorithmic progress in the relation C_m = k log F_m + b is scale-dependent by determining if the slope parameter k varies across training compute regimes, and quantify how compute-efficiency reductions differ at different FLOP scales (e.g., 10^21 versus 10^25 FLOP).
References
Most crucially, algorithmic progress could also involve changing k, such that the rate of algorithmic progress might depend on the specific scale of compute under question -- for example, the rate of reduction in compute requirements might be faster at 10{25} FLOP compared to 10{21} FLOP. Unfortunately, we do not have sufficient data to test this scale-dependence of algorithmic progress in detail, so for the purposes of this paper we present our results assuming scale-independence.