Dice Question Streamline Icon: https://streamlinehq.com

Sample complexity of training non-linear neural networks

Determine the number of training samples required to train a non-linear neural network in order to achieve reliable generalization performance, characterizing the sample complexity as a function of the architecture and data distribution (for example, for deep feed-forward networks with ReLU activation).

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper highlights that despite extensive practical success, core statistical aspects of neural networks remain insufficiently understood. A central issue is quantifying how the amount of training data impacts generalization for non-linear neural networks.

Prior work has provided various generalization upper bounds with dependencies on norms, depth, or width, and mini-max lower bounds in limited settings (e.g., linear activations or sinusoidal networks). The authors address this question for ReLU feed-forward networks by deriving a minimax lower bound that scales as sqrt(log(d)/n), aligning with recent upper bounds and supported by empirical results. This partially resolves the question in the specific ReLU feed-forward setting, while leaving the broader problem for general non-linear architectures as a fundamental direction.

References

For example, a basic yet very important open question is: how many training samples are needed to train a (non-linear) neural network?

How many samples are needed to train a deep neural network? (2405.16696 - Golestaneh et al., 26 May 2024) in Section 1, Introduction