Quantify the computational advantage of depth under gradient-based training

Establish a rigorous quantitative characterization, in an analytically tractable setting, of the computational advantage of deep neural networks trained with gradient-based methods relative to shallow models, specifying the criteria (e.g., sample complexity or generalization performance) by which the advantage is measured and the assumptions under which it holds.

Background

The paper motivates a central theoretical challenge: although shallow models can approximate complex functions, deep networks trained with gradient descent show practical advantages that are poorly understood theoretically. The authors introduce hierarchical target classes (SIGHT and MIGHT) to analyze how depth enables progressive dimensionality reduction and improved sample complexity, but they highlight that a general, analyzable setting quantifying the computational advantage of depth remains to be pinned down.

This open problem frames the broader contribution of the work: providing a controlled framework where depth yields a hierarchical coarse-graining mechanism and demonstrating separations between deep and shallow training, while calling for a general quantitative theory that extends beyond specific constructions.

References

A fundamental open problem is thus: Can one quantify the computational advantage of deep models trained with gradient-based methods with respect to shallow models in some analyzable setting?

— The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent (2502.13961 - Dandi et al., 19 Feb 2025) in Introduction (opening section)

Quantify the computational advantage of depth under gradient-based training

Background

References

Related Problems