Dice Question Streamline Icon: https://streamlinehq.com

Refine stability bounds for sieve SGD beyond the a + 2wτ < 1 requirement

Establish improved first-order stability bounds for the sieve stochastic gradient descent estimator defined by the update \hat f_i = \hat f_{i-1} + \alpha_i (Y_i − ⟨\hat f_{i-1}, Z_i⟩) Λ_i Z_i with learning rates \alpha_i = c i^{−a}, shrinkage exponent w ≥ 0 (entering Λ_i via diagonal entries j^{−2w} 1(j ≤ J_i)), and basis growth J_i = i^{\tau}, that hold without (or under a significantly weaker) restriction than a + 2 w τ < 1 and w > 1/2 from Proposition 6.4; in particular, derive stability rates compatible with practical choices (including w close to 0) and quantify the precise dependence on (a, τ, w) needed for the rolling-validation stability condition in Assumption 3.3.

Information Square Streamline Icon: https://streamlinehq.com

Background

Section 6.4 analyzes the stability of online estimators, focusing on sieve stochastic gradient descent (SGD) for nonparametric regression with the update \hat f_i = \hat f_{i-1} + \alpha_i (Y_i − ⟨\hat f_{i-1}, Z_i⟩) Λ_i Z_i and tuning sequences J_i = i{\tau}, \alpha_i = c i{−a}, and shrinkage exponent w. Proposition 6.4 proves a first-order stability bound under conditions including w > 1/2 and a + 2 w τ < 1, which can be restrictive in practice.

In Remark 6.4, the authors observe that these stability conditions may be incompatible with parameter choices that yield optimal convergence rates and conjecture that the requirement a + 2 w τ < 1 is an artifact of the current proof. They suggest that refined analysis should allow weaker constraints on w (potentially even w = 0), consistent with known consistency results for sieve SGD without shrinkage. The open problem seeks sharper stability bounds that relax these constraints while preserving the rolling-validation stability needed for online model selection consistency.

References

A caveat here is that such choices of (a, \tau_1, \tau_2) are incompatible with the requirement of a+2w\tau < 1 in \Cref{pro:6.4-stab-sieve-sgd} unless we allow for w<1/4. We further conjecture that this requirement is an artifact of the current proof and can be improved using a more refined argument. Intuitively, a larger value of w implies stronger shrinkage, hence making the estimate more stable. An evidence supporting this conjecture is that one can establish consistency of sieve SGD with w=0.

A Modern Theory of Cross-Validation through the Lens of Stability (2505.23592 - Lei, 29 May 2025) in Remark 6.4, Section 6.4