Extend GD effective-variance bound beyond Gaussian design

Establish, under the subgaussian covariate assumption that the entries of Σ^{-1/2}x are independent and σ_x^2-subgaussian (Assumption 1), an effective-variance bound of order O(D/n + (D1/n)^2) for early-stopped gradient descent on well-specified linear regression with step size η ≤ 1/(2 tr(Σ)) and stopping time t ≤ b n, where D = k* + (η t)^2 ∑_{i>k*} λ_i^2, D1 = k* + η t ∑_{i>k*} λ_i, (λ_i) are the eigenvalues of the covariance Σ in nonincreasing order, and k* = min{ k : (1/(η t)) ≥ c2 λ_{k+1} } for a fixed constant c2 > 0.

Background

The paper proves an SGD-style upper bound for gradient descent (GD) that decomposes excess risk into variance, effective bias, and effective variance terms. Under Gaussian covariates, the effective variance term tightens to O(D/n + (D1/n)2) by leveraging stronger concentration (Lemma tail-concentration).

Under the broader subgaussian design assumption (Assumption 1), the current proof yields only O(D1/n) for the effective variance term. The authors conjecture that the Gaussian-only improvement is a technical artifact and expect the tighter O(D/n + (D1/n)2) to hold under Assumption 1 as well.

References

Note that in \Cref{thm:gd:exp}, the effective variance bound improves from \bigO(D_1/n) to \bigO\big(D/n + (D_1/n)2\big) when the covariates are exactly Gaussian. This is because Gaussian design allows us to prove a stronger concentration bound (see \Cref{lemma:tail-concentration} in \Cref{append:sec:gd:exp}). However, we conjecture this is just a technical artifact and that the \bigO\big(D/n + (D_1/n)2\big) effective variance bound holds under the more general \Cref{assum:upper-bound:item:x}. We leave this as future work.

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization (2509.17251 - Wu et al., 21 Sep 2025) in Remark (role of Gaussian design) after Theorem 4 (an upper bound for GD), Section 4.2; An SGD-type bound for GD