Extend GD effective-variance bound beyond Gaussian design
Establish, under the subgaussian covariate assumption that the entries of Σ^{-1/2}x are independent and σ_x^2-subgaussian (Assumption 1), an effective-variance bound of order O(D/n + (D1/n)^2) for early-stopped gradient descent on well-specified linear regression with step size η ≤ 1/(2 tr(Σ)) and stopping time t ≤ b n, where D = k* + (η t)^2 ∑_{i>k*} λ_i^2, D1 = k* + η t ∑_{i>k*} λ_i, (λ_i) are the eigenvalues of the covariance Σ in nonincreasing order, and k* = min{ k : (1/(η t)) ≥ c2 λ_{k+1} } for a fixed constant c2 > 0.
Sponsor
References
Note that in \Cref{thm:gd:exp}, the effective variance bound improves from \bigO(D_1/n) to \bigO\big(D/n + (D_1/n)2\big) when the covariates are exactly Gaussian. This is because Gaussian design allows us to prove a stronger concentration bound (see \Cref{lemma:tail-concentration} in \Cref{append:sec:gd:exp}). However, we conjecture this is just a technical artifact and that the \bigO\big(D/n + (D_1/n)2\big) effective variance bound holds under the more general \Cref{assum:upper-bound:item:x}. We leave this as future work.