Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations (2102.13233v1)

Published 25 Feb 2021 in cs.LG

Abstract: In this paper, it is shown theoretically that spurious local minima are common for deep fully-connected networks and convolutional neural networks (CNNs) with piecewise linear activation functions and datasets that cannot be fitted by linear models. A motivating example is given to explain the reason for the existence of spurious local minima: each output neuron of deep fully-connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) output, and different pieces of CPWL output can fit disjoint groups of data samples when minimizing the empirical risk. Fitting data samples with different CPWL functions usually results in different levels of empirical risk, leading to prevalence of spurious local minima. This result is proved in general settings with any continuous loss function. The main proof technique is to represent a CPWL function as a maximization over minimization of linear pieces. Deep ReLU networks are then constructed to produce these linear pieces and implement maximization and minimization operations.

Citations (7)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations (2102.13233v1)

Summary

Related Papers