Conjectured closeness of SGD loss to deterministic Volterra solution
Establish that, for the power-law random features (PLRF) model trained with one-pass stochastic gradient descent (SGD) initialized at θ0 = 0, the expected loss trajectory E[P(θr) | W] is uniformly within a multiplicative factor (1 ± ε) of the deterministic solution P(r) to the convolution-type Volterra equation built from the deterministic equivalent resolvent, uniformly over iterations r and for all admissible dimensions v and d, with probability tending to 1 as d → ∞.
References
We conjecture that the two processes are close: for {\theta_r} the sequence of iterates generated by SGD with \theta_0 = 0 and any \varepsilon > 0, (1- \varepsilon) \le \sup_{r \in \mathbb{N} \bigg { \frac{ \mathbb{E}[ \CMscr{P}(\theta_r) \vert W]}{\mathscr{P}(r)} \bigg } \le (1 + \varepsilon), for all admissible V, d with probability going to 1 as d \to \infty. We leave this for future research and suspect it is true because of deterministic equivalence for random matrices and our numerical simulations.