Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression (2006.05800v4)

Published 10 Jun 2020 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: We consider the linear model $\mathbf{y} = \mathbf{X} \mathbf{\beta}\star + \mathbf{\epsilon}$ with $\mathbf{X}\in \mathbb{R}{n\times p}$ in the overparameterized regime $p>n$. We estimate $\mathbf{\beta}\star$ via generalized (weighted) ridge regression: $\hat{\mathbf{\beta}}\lambda = \left(\mathbf{X}T\mathbf{X} + \lambda \mathbf{\Sigma}_w\right)\dagger \mathbf{X}T\mathbf{y}$, where $\mathbf{\Sigma}_w$ is the weighting matrix. Under a random design setting with general data covariance $\mathbf{\Sigma}_x$ and anisotropic prior on the true coefficients $\mathbb{E}\mathbf{\beta}\star\mathbf{\beta}\starT = \mathbf{\Sigma}\beta$, we provide an exact characterization of the prediction risk $\mathbb{E}(y-\mathbf{x}T\hat{\mathbf{\beta}}_\lambda)2$ in the proportional asymptotic limit $p/n\rightarrow \gamma \in (1,\infty)$. Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting $\lambda_{\rm opt}$ for the ridge parameter $\lambda$ and confirm the implicit $\ell_2$ regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that $\lambda_{\rm opt}$ can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when both $\mathbf{X}$ and $\mathbf{\beta}\star$ are anisotropic. Finally, we determine the optimal weighting matrix $\mathbf{\Sigma}_w$ for both the ridgeless ($\lambda\to 0$) and optimally regularized ($\lambda = \lambda{\rm opt}$) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

Citations (115)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com