Explain the quadratic dependence on sparsity in CNN sample complexity
Derive a theoretical explanation for why the sample complexity P*_CNN of Convolutional Neural Networks trained to learn the Sparse Random Hierarchy Model scales quadratically with the sparsity parameter (s0+1), despite each shared weight being connected to a fraction of the input that is independent of the hierarchy depth L. Establish principled conditions under which P*_CNN ∝ (s0+1)^2 n_c m^L arises from weight sharing and spatial sparsity of informative features in the SRHM.
Sponsor
References
Qualitatively, the same scenario holds for CNNs. One expects a different sample complexity since each weight is now connected to a fraction of the input that is independent of $L$. Yet, the quadratic dependence on $s_0+1$ remains to be understood.
— How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model
(2404.10727 - Tomasini et al., 16 Apr 2024) in Section 6, Sample complexities arguments