Training-time theory for the serial vs. parallel contrast
Establish rigorous theorems that characterize a parallel-versus-serial computation contrast during neural network training, analogous to the inference-time contrast analyzed in the paper, thereby providing a formal learning-theoretic account of how serial computation requirements manifest in the training phase.
References
Because learning theory is extremely difficult, we could neither find theorems proven by people that came before us, nor prove theorems ourselves.
— The Serial Scaling Hypothesis
(2507.12549 - Liu et al., 16 Jul 2025) in Section 3.3, Training vs. Inference (Potential Misconceptions)