Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network (2007.02486v2)

Published 6 Jul 2020 in stat.ML and cs.LG

Abstract: Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the $L_2$ estimation error with respect to the GD iterations, which is away from zero without a delicate scheme of early stopping. In turn, through a comprehensive analysis of $\ell_2$-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the $\ell_2$ regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax {optimal} rate of $L_2$ estimation error can be achieved. Numerical experiments confirm our theory and further demonstrate that the $\ell_2$ regularization approach improves the training robustness and works for a wider range of neural networks.

Authors (4)

Tianyang Hu (40 papers)
Wenjia Wang (68 papers)
Cong Lin (12 papers)
Guang Cheng (136 papers)

Citations (44)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network (2007.02486v2)

Summary

Related Papers