Explicit Regularization in Overparametrized Models via Noise Injection (2206.04613v3)
Abstract: Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties. In this paper, we investigate the effects of injecting noise before computing a gradient step. We demonstrate that small perturbations can induce explicit regularization for simple models based on the L1-norm, group L1-norms, or nuclear norms. However, when applied to overparametrized neural networks with large widths, we show that the same perturbations can cause variance explosion. To overcome this, we propose using independent layer-wise perturbations, which provably allow for explicit regularization without variance explosion. Our empirical results show that these small perturbations lead to improved generalization performance compared to vanilla gradient descent.
- Antonio Orvieto (46 papers)
- Anant Raj (38 papers)
- Hans Kersting (12 papers)
- Francis Bach (249 papers)