Refine stability bounds for sieve SGD beyond the a + 2wτ < 1 requirement
Establish improved first-order stability bounds for the sieve stochastic gradient descent estimator defined by the update \hat f_i = \hat f_{i-1} + \alpha_i (Y_i − ⟨\hat f_{i-1}, Z_i⟩) Λ_i Z_i with learning rates \alpha_i = c i^{−a}, shrinkage exponent w ≥ 0 (entering Λ_i via diagonal entries j^{−2w} 1(j ≤ J_i)), and basis growth J_i = i^{\tau}, that hold without (or under a significantly weaker) restriction than a + 2 w τ < 1 and w > 1/2 from Proposition 6.4; in particular, derive stability rates compatible with practical choices (including w close to 0) and quantify the precise dependence on (a, τ, w) needed for the rolling-validation stability condition in Assumption 3.3.
References
A caveat here is that such choices of (a, \tau_1, \tau_2) are incompatible with the requirement of a+2w\tau < 1 in \Cref{pro:6.4-stab-sieve-sgd} unless we allow for w<1/4. We further conjecture that this requirement is an artifact of the current proof and can be improved using a more refined argument. Intuitively, a larger value of w implies stronger shrinkage, hence making the estimate more stable. An evidence supporting this conjecture is that one can establish consistency of sieve SGD with w=0.