How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model (2404.10727v2)
Abstract: Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invariances of the task, such as smooth transformations for image datasets, has been argued to be important for deep networks and it strongly correlates with their performance. In this work, we aim to explain this correlation and unify these two viewpoints. We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations. In particular, we introduce the Sparse Random Hierarchy Model (SRHM), where we observe and rationalize that a hierarchical representation mirroring the hierarchical model is learnt precisely when such insensitivity is learnt, thereby explaining the strong correlation between the latter and performance. Moreover, we quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task.
- The staircase property: How hierarchical structure can guide deep learning. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 26989–27002. Curran Associates, Inc., 2021.
- ADef: an Iterative Algorithm to Construct Adversarial Deformations. September 2018. URL https://openreview.net/forum?id=Hk4dFjR5K7.
- Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4840–4849, Long Beach, CA, USA, June 2019. IEEE. ISBN 978-1-72813-293-8. doi: 10.1109/CVPR.2019.00498. URL https://ieeexplore.ieee.org/document/8954212/.
- How can deep learning performs deep (hierarchical) learning, 2023. URL https://openreview.net/forum?id=j2ymLjCr-Sj.
- Synthesizing Robust Adversarial Examples. In International Conference on Machine Learning, pp. 284–293. PMLR, July 2018. URL http://proceedings.mlr.press/v80/athalye18b.html. ISSN: 2640-3498.
- Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint arXiv:1805.12177, 2018.
- Bach, F. Breaking the curse of dimensionality with convex neural networks. The Journal of Machine Learning Research, 18(1):629–681, 2017.
- E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Communications, 13(1):2453, May 2022. ISSN 2041-1723. doi: 10.1038/s41467-022-29939-5. URL https://doi.org/10.1038/s41467-022-29939-5.
- Bietti, A. Approximation and learning with deep convolutional models: a kernel perspective, 2022.
- On the sample complexity of learning under invariance and geometric stability, 2021.
- Learning single-index models with shallow neural networks, 2022.
- Machine learning and invariant theory, 2023. URL https://arxiv.org/abs/2209.14991.
- On the opportunities and risks of foundation models, 2022.
- Language models are few-shot learners, 2020.
- Invariant scattering convolution networks. IEEE transactions on pattern analysis and machine intelligence, 35(8):1872–1886, 2013.
- What can be learnt with wide convolutional neural networks? In International Conference on Machine Learning, pp. 3347–3379. PMLR, 2023a.
- How deep neural networks learn compositional data: The random hierarchy model, 2023b.
- Castleman, K. R. Digital image processing. Prentice Hall Press, 1996.
- Learning the irreducible representations of commutative lie groups. In Xing, E. P. and Jebara, T. (eds.), Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pp. 1755–1763, Bejing, China, 22–24 Jun 2014. PMLR. URL https://proceedings.mlr.press/v32/cohen14.html.
- Group equivariant convolutional networks. In Balcan, M. F. and Weinberger, K. Q. (eds.), Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pp. 2990–2999, New York, New York, USA, 20–22 Jun 2016. PMLR. URL https://proceedings.mlr.press/v48/cohenc16.html.
- Learning curves: Asymptotic values and rate of convergence. Advances in neural information processing systems, 6, 1993.
- DeGiuli, E. Random language model. Physical Review Letters, 122(12):128301, 2019.
- Exploiting cyclic symmetry in convolutional neural networks. arXiv preprint arXiv:1602.02660, 2016.
- Hierarchical nucleation in deep neural networks. Advances in Neural Information Processing Systems, 33:7526–7536, 2020.
- Elesedy, B. Provably strict generalisation benefit for invariance in kernel methods. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 17273–17283. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/8fe04df45a22b63156ebabbb064fcd5e-Paper.pdf.
- Provably strict generalisation benefit for equivariant models. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 2959–2969. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/elesedy21a.html.
- Exploring the Landscape of Spatial Robustness. In International Conference on Machine Learning, pp. 1802–1811. PMLR, May 2019. URL http://proceedings.mlr.press/v97/engstrom19a.html. ISSN: 2640-3498.
- The complexity of explaining neural networks through (group) invariants. In Hanneke, S. and Reyzin, L. (eds.), Proceedings of the 28th International Conference on Algorithmic Learning Theory, volume 76 of Proceedings of Machine Learning Research, pp. 341–359. PMLR, 15–17 Oct 2017. URL https://proceedings.mlr.press/v76/ensign17a.html.
- Locality defeats the curse of dimensionality in convolutional teacher-student scenarios. Advances in Neural Information Processing Systems, 34:9456–9467, 2021.
- Manitest: Are classifiers really invariant? In Procedings of the British Machine Vision Conference 2015, pp. 106.1–106.13, Swansea, 2015. British Machine Vision Association. ISBN 978-1-901725-53-7. doi: 10.5244/C.29.106. URL http://www.bmva.org/bmvc/2015/papers/paper106/index.html.
- A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 3318–3328. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/finzi21a.html.
- Fukushima, K. Cognitron: A self-organizing multilayered neural network. Biological Cybernetics, 20(3):121–136, September 1975. ISSN 1432-0770. doi: 10.1007/BF00342633. URL https://doi.org/10.1007/BF00342633.
- Grenander, U. Elements of pattern theory. JHU Press, 1996.
- Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016. doi: 10.1109/CVPR.2016.90.
- Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409, 2017.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Compositionality decomposed: how do neural networks generalise? (extended abstract). In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI’20, 2021. ISBN 9780999241165.
- Geometric Robustness of Deep Networks: Analysis and Improvement. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4441–4449, Salt Lake City, UT, June 2018. IEEE. ISBN 978-1-5386-6420-9. doi: 10.1109/CVPR.2018.00467. URL https://ieeexplore.ieee.org/document/8578565/.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- On translation invariance in cnns: Convolutional layers can exploit absolute spatial location. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14274–14285, 2020.
- On the generalization of equivariance and convolution in neural networks to the action of compact groups. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 2747–2755. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/kondor18a.html.
- le Cun, Y. Generalization and network design strategies. In Connectionism in perspective, volume 19, pp. 143–155, 1989. URL https://api.semanticscholar.org/CorpusID:244797850.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998. ISSN 1558-2256. doi: 10.1109/5.726791. Conference Name: Proceedings of the IEEE.
- Deep learning. Nature, 521(7553):436, 2015.
- Distance-based classification with lipschitz functions. The Journal of Machine Learning Research, 5(Jun):669–695, 2004.
- A provably correct algorithm for deep learning that actually works, 2018.
- The implications of local correlation on learning some deep functions. Advances in Neural Information Processing Systems, 33:1322–1332, 2020.
- Mallat, S. Understanding deep convolutional networks. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065):20150203, 2016.
- Learning with invariances in random features and kernel models. In Belkin, M. and Kpotufe, S. (eds.), Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pp. 3351–3418. PMLR, 15–19 Aug 2021. URL https://proceedings.mlr.press/v134/mei21a.html.
- Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration. Applied and Computational Harmonic Analysis, 59:3–84, 2022.
- Mossel, E. Deep learning and hierarchal generative models, 2018.
- Relative stability toward diffeomorphisms indicates performance in deep nets. Advances in Neural Information Processing Systems, 34:8727–8739, 2021.
- Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. International Journal of Automation and Computing, 14(5):503–519, 2017.
- Handbook of Formal Languages. Springer, January 1997. doi: 10.1007/978-3-642-59126-6.
- Pooling is neither necessary nor sufficient for appropriate deformation stability in CNNs. arXiv:1804.04438 [cs, stat], May 2018. URL http://arxiv.org/abs/1804.04438. arXiv: 1804.04438.
- Schmidt-Hieber, J. Nonparametric regression using deep neural networks with relu activation function. The Annals of Statistics, 48(4):1875–1897, 2020.
- A phase transition in diffusion models reveals the hierarchical nature of data, 2024.
- Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR, 2015.
- Asymptotic learning curves of kernel methods: empirical data versus teacher–student paradigm. Journal of Statistical Mechanics: Theory and Experiment, 2020(12):124001, December 2020. ISSN 1742-5468. doi: 10.1088/1742-5468/abc61d. URL https://doi.org/10.1088/1742-5468/abc61d. Publisher: IOP Publishing.
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning, pp. 6105–6114. PMLR, May 2019. URL http://proceedings.mlr.press/v97/tan19a.html. ISSN: 2640-3498.
- How deep convolutional neural networks lose spatial information with training. Machine Learning: Science and Technology, 4(4):045026, nov 2023. doi: 10.1088/2632-2153/ad092c. URL https://dx.doi.org/10.1088/2632-2153/ad092c.
- Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, pp. 1–13, 2018.
- Spatially Transformed Adversarial Examples. February 2018. URL https://openreview.net/forum?id=HyydRMZC-.
- Xiao, L. Eigenspace restructuring: a principle of space and frequency in neural networks. In Conference on Learning Theory, pp. 4888–4944. PMLR, 2022.
- Synergy and symmetry in deep learning: Interactions between the data, model, and inference algorithm. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 24347–24369. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/xiao22a.html.
- Visualizing and understanding convolutional networks. In Computer Vision – ECCV 2014, Lecture Notes in Computer Science, pp. 818–833, 2014.
- Zhang, R. Making convolutional networks shift-invariant again. arXiv preprint arXiv:1904.11486, 2019.
- Umberto Tomasini (1 paper)
- Matthieu Wyart (89 papers)