Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares (2007.13288v2)

Published 27 Jul 2020 in math.NA, cs.LG, cs.NA, math.OC, and stat.ML

Abstract: We study the behavior of stochastic gradient descent applied to $|Ax -b |22 \rightarrow \min$ for invertible $A \in \mathbb{R}{n \times n}$. We show that there is an explicit constant $c{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left| Ax_{k+1}-b\right|2_{2} \leq \left(1 + \frac{c_{A}}{|A|F2}\right) \left|A x_k -b \right|2{2} - \frac{2}{|A|F2} \left|AT A (x_k - x)\right|2{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.

Citations (1)

Summary

We haven't generated a summary for this paper yet.