Link approximation-theoretic depth advantages to sample complexity under gradient descent

Determine how known approximation-theoretic separations in favor of deeper architectures translate into concrete improvements in sample complexity for neural networks trained via gradient descent, specifying the training dynamics and conditions under which such improvements can be rigorously established.

Background

Prior work has established that deeper networks can have approximation advantages over shallower ones, but how these advantages carry over to statistical efficiency when trained by gradient-based methods is not theoretically settled. The authors emphasize that this gap motivates their paper of hierarchical targets, where depth facilitates feature learning and dimensionality reduction.

The unresolved question seeks a principled bridge between approximation theory and learning dynamics, clarifying when and why depth leads to better sample complexity in practice under gradient descent.

References

However, it remains unclear how these approximation gaps translate into sample complexity ones for neural networks when trained through gradient descent.

— The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent (2502.13961 - Dandi et al., 19 Feb 2025) in Related Works, 3-Layers networks paragraph

Link approximation-theoretic depth advantages to sample complexity under gradient descent

Background

References

Related Problems