Link approximation-theoretic depth advantages to sample complexity under gradient descent
Determine how known approximation-theoretic separations in favor of deeper architectures translate into concrete improvements in sample complexity for neural networks trained via gradient descent, specifying the training dynamics and conditions under which such improvements can be rigorously established.
References
However, it remains unclear how these approximation gaps translate into sample complexity ones for neural networks when trained through gradient descent.
— The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent
(2502.13961 - Dandi et al., 19 Feb 2025) in Related Works, 3-Layers networks paragraph