Dice Question Streamline Icon: https://streamlinehq.com

Conjectured closeness of SGD loss to deterministic Volterra solution

Establish that, for the power-law random features (PLRF) model trained with one-pass stochastic gradient descent (SGD) initialized at θ0 = 0, the expected loss trajectory E[P(θr) | W] is uniformly within a multiplicative factor (1 ± ε) of the deterministic solution P(r) to the convolution-type Volterra equation built from the deterministic equivalent resolvent, uniformly over iterations r and for all admissible dimensions v and d, with probability tending to 1 as d → ∞.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper replaces the random resolvent (Ĥ − z)−1 with a deterministic equivalent R(z) and defines a deterministic loss trajectory P(r) via a convolution-type Volterra equation. This deterministic trajectory numerically matches SGD training dynamics and enables scaling-law analysis.

To connect the rigorous stochastic process to the deterministic approximation, the authors formulate a conjecture that the expected SGD loss E[P(θr) | W] closely tracks the deterministic solution P(r) across iterations, dimensions, and with high probability as d grows. Proving this would justify using the deterministic equivalent to analyze compute-optimal scaling laws and phase behavior.

References

We conjecture that the two processes are close: for {\theta_r} the sequence of iterates generated by SGD with \theta_0 = 0 and any \varepsilon > 0, (1- \varepsilon) \le \sup_{r \in \mathbb{N} \bigg { \frac{ \mathbb{E}[ \CMscr{P}(\theta_r) \vert W]}{\mathscr{P}(r)} \bigg } \le (1 + \varepsilon), for all admissible V, d with probability going to 1 as d \to \infty. We leave this for future research and suspect it is true because of deterministic equivalence for random matrices and our numerical simulations.

4+3 Phases of Compute-Optimal Neural Scaling Laws (2405.15074 - Paquette et al., 23 May 2024) in Subsection “Deterministic equivalent of the loss under SGD” (Section: Analysis of Volterra equation under the deterministic equivalent)