GD dominance versus ridge with negative regularization and moderate stepsizes

Determine whether gradient descent with moderate step sizes n/∥XX^T∥ < η < 2n/∥XX^T∥ continues to dominate ridge regression in excess risk when ridge regularization is allowed to be negative (λ < 0) for well-specified random-design linear regression under the paper’s assumptions on covariates and noise.

Background

The core comparison results in the paper establish that, for λ ≥ 0 and stable GD stepsizes η ≤ n/∥XX^T∥, GD’s excess risk is never more than a constant factor larger than ridge’s and can be polynomially better on some instances.

Prior work shows negative ridge regularization can be beneficial in certain regimes, and the authors suggest extending GD analysis to moderate stepsizes where GD oscillates in iterates but decreases empirical risk. Whether GD still dominates ridge under λ < 0 remains unresolved.

References

It is unclear if GD still dominates ridge regression when allowing \lambda<0. We conjecture that this is true when extending the stepsize for GD from small ones to moderate ones, n/|\XB\XB^\top|< \eta < 2n/|\XB\XB^\top|, with which GD oscillates in iterates but still monotonically decreases the empirical risk. This is left for future investigation.

— Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization (2509.17251 - Wu et al., 21 Sep 2025) in Concluding remarks, paragraph “Negative ridge and oscillatory GD”

GD dominance versus ridge with negative regularization and moderate stepsizes

Sponsor

Background

References

Related Problems