On the hardness of learning under symmetries (2401.01869v1)
Abstract: We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard.
- Polynomial-time universality and limitations of deep learning. Communications on Pure and Applied Mathematics, 76(11):3493–3549, 2023.
- Learning sparse polynomial functions. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 500–510. SIAM, 2014.
- Quantum variational algorithms are swamped with traps. Nature Communications, 13(1):7760, 2022.
- Efficient classical algorithms for simulating symmetric quantum systems. arXiv preprint arXiv:2211.16998, 2022.
- Learning two layer rectified neural networks in polynomial time. In Conference on Learning Theory, pages 195–268. PMLR, 2019.
- A Barvinok. Measure concentration lecture notes. See http://www. math. lsa. umich. edu/barvinok/total710. pdf, 2005.
- E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):2453, 2022.
- A pac-bayesian generalization bound for equivariant networks. Advances in Neural Information Processing Systems, 35:5654–5668, 2022.
- On the sample complexity of learning under geometric stability. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=vlf0zTKa5Lh.
- Training a 3-node neural network is np-complete. Advances in neural information processing systems, 1, 1988.
- Weakly learning dnf and characterizing statistical query learning using fourier analysis. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 253–262, 1994.
- Statistical query algorithms and low-degree tests are almost equivalent. arXiv preprint arXiv:2009.06107, 2020.
- Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
- Globally optimal gradient descent for a convnet with gaussian inputs. In International conference on machine learning, pages 605–614. PMLR, 2017.
- Tight sample complexity of learning one-hidden-layer convolutional neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6):805–849, 2012. doi: 10.1007/s10208-012-9135-7. URL https://doi.org/10.1007/s10208-012-9135-7.
- Sourav Chatterjee. Superconcentration and Related Topics. Springer Cham, 2014. http://example.com/history_of_time.pdf(visited 2016-01-01).
- A faster and simpler algorithm for learning shallow networks. arXiv preprint arXiv:2307.12496, 2023.
- Hardness of noise-free learning for two-hidden-layer neural networks. Advances in Neural Information Processing Systems, 35:10709–10724, 2022a.
- Learning deep relu networks is fixed-parameter tractable. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 696–707. IEEE, 2022b.
- Learning narrow one-hidden-layer relu networks. In The Thirty Sixth Annual Conference on Learning Theory, pages 5580–5614. PMLR, 2023.
- Can graph neural networks count substructures? Advances in neural information processing systems, 33:10383–10395, 2020.
- On the implicit bias of linear equivariant steerable networks: Margin, generalization, and their equivalence to data augmentation. arXiv preprint arXiv:2303.04198, 2023.
- Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. In Conference on Learning Theory, pages 1305–1338. PMLR, 2020.
- Quantum convolutional neural networks. Nature Physics, 15(12):1273–1278, 2019.
- Amit Daniely. Complexity theoretic limitations on learning halfspaces. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 105–117, 2016.
- Hardness of learning neural networks with natural weights. Advances in Neural Information Processing Systems, 33:930–940, 2020.
- From average case complexity to improper learning complexity. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 441–448, 2014.
- Most neural networks are almost learnable. arXiv preprint arXiv:2305.16508, 2023.
- On the learnability of deep random networks. arXiv preprint arXiv:1904.03866, 2019.
- Random deep neural networks are biased towards simple functions. Advances in Neural Information Processing Systems, 32, 2019.
- Computational invariant theory. Springer, 2015.
- Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 73–84, 2017. doi: 10.1109/FOCS.2017.16.
- Algorithms and sq lower bounds for pac learning one-hidden-layer relu networks. In Conference on Learning Theory, pages 1514–1539. PMLR, 2020.
- Gradient descent learns one-hidden-layer cnn: Don’t be afraid of spurious local minima. In International Conference on Machine Learning, pages 1339–1348. PMLR, 2018a.
- Improved learning of one-hidden-layer convolutional neural networks with overlaps. arXiv preprint arXiv:1805.07798, 2018.
- When is a convolutional filter easy to learn? arXiv preprint arXiv:1709.06129, 2017.
- How many samples are needed to estimate a convolutional neural network? Advances in Neural Information Processing Systems, 31, 2018b.
- Benchmarking graph neural networks. arXiv preprint arXiv:2003.00982, 2020.
- Bryn Elesedy. Provably strict generalisation benefit for invariance in kernel methods. Advances in Neural Information Processing Systems, 34:17273–17283, 2021.
- Agnostic learning of monomials by halfspaces is hard. SIAM Journal on Computing, 41(6):1558–1590, 2012.
- Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
- David Gamarnik. The overlap gap property: A topological barrier to optimizing over random structures. Proceedings of the National Academy of Sciences, 118(41):e2108492118, 2021.
- Expressiveness and approximation properties of graph neural networks. arXiv preprint arXiv:2204.04661, 2022.
- Learning one convolutional layer with overlapping patches. In International Conference on Machine Learning, pages 1783–1791. PMLR, 2018.
- Superpolynomial lower bounds for learning one-layer neural networks using gradient descent. In International Conference on Machine Learning, pages 3587–3596. PMLR, 2020.
- Implicit bias of gradient descent on linear convolutional networks. Advances in neural information processing systems, 31, 2018.
- Hardness of learning halfspaces with noise. SIAM Journal on Computing, 39(2):742–765, 2009.
- Bayesian estimation from few samples: community detection and related problems. arXiv preprint arXiv:1710.00264, 2017.
- Stefanie Jegelka. Theory of graph neural networks: Representation and learning. arXiv preprint arXiv:2204.07697, 2022.
- Directional convergence and alignment in deep learning. Advances in Neural Information Processing Systems, 33:17176–17186, 2020.
- J Stephen Judd. Learning in networks is hard. In Proc. of 1st International Conference on Neural Networks, San Diego, California, June 1987. IEEE, 1987.
- Sgd on neural networks learns functions of increasing complexity. Advances in neural information processing systems, 32, 2019.
- Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983–1006, 1998.
- WF Kibble. An extension of a theorem of mehler’s on hermite polynomials. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 41, pages 12–15. Cambridge University Press, 1945.
- Embedding hard learning problems into gaussian space. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.
- Learning symmetric k-juntas in time n^ o (k). arXiv preprint math/0504246, 2005.
- Implicit bias of linear equivariant networks. arXiv preprint arXiv:2110.06084, 2021.
- Training invariances and the low-rank phenomenon: beyond linear networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=XEW8CQgArno.
- Learning graph neural networks with approximate gradient descent. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8438–8446, 2021.
- A pac-bayesian approach to generalization bounds for graph neural networks. arXiv preprint arXiv:2012.07690, 2020.
- Sign and basis invariant networks for spectral graph representation learning. arXiv preprint arXiv:2202.13013, 2022.
- On the fourier spectrum of symmetric boolean functions with applications to learning symmetric juntas. In 20th Annual IEEE Conference on Computational Complexity (CCC’05), pages 112–119. IEEE, 2005.
- On the computational efficiency of training neural networks. Advances in neural information processing systems, 27, 2014.
- Generalization bounds for deep convolutional neural networks. arXiv preprint arXiv:1905.12600, 2019.
- Andreas Loukas. What graph neural networks cannot learn: depth vs width. arXiv preprint arXiv:1907.03199, 2019.
- Shaogao Lv. Generalization bounds for graph convolutional neural networks via rademacher complexity. arXiv preprint arXiv:2102.10234, 2021.
- I G Macdonald. Symmetric functions and Hall polynomials / by I. Oxford University Press, Oxford : New York, 1979.
- Learning with invariances in random features and kernel models. In Conference on Learning Theory, pages 3351–3418. PMLR, 2021.
- Exploiting symmetry in variational quantum machine learning. PRX Quantum, 4(1):010328, 2023.
- Foundations of machine learning. MIT press, 2018.
- Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019.
- Learning juntas. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 206–212, 2003.
- Theory for equivariant quantum neural networks. arXiv preprint arXiv:2210.08566, 2022.
- End-to-end learning of a convolutional neural network via deep tensor decomposition. arXiv preprint arXiv:1805.06523, 2018.
- Ian Parberry. Circuit complexity and neural networks. MIT press, 1994.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- Absence of barren plateaus in quantum convolutional neural networks. Physical Review X, 11(4):041011, 2021.
- Approximation-generalization trade-offs under (approximate) group equivariance. arXiv preprint arXiv:2305.17592, 2023.
- Gilles Pisier. Probabilistic methods in the geometry of banach spaces. In Giorgio Letta and Maurizio Pratelli, editors, Probability and Analysis, pages 167–241, Berlin, Heidelberg, 1986. Springer Berlin Heidelberg. ISBN 978-3-540-40955-7.
- Equivariant polynomials for graph neural networks. arXiv preprint arXiv:2302.11556, 2023.
- Representation theory for geometric quantum machine learning. arXiv preprint arXiv:2210.07980, 2022.
- Lev Reyzin. Statistical queries and statistical algorithms: Foundations and applications. arXiv preprint arXiv:2004.00557, 2020.
- Improved generalization bounds of group invariant/equivariant deep networks via quotient feature spaces. In Uncertainty in Artificial Intelligence, pages 771–780. PMLR, 2021.
- Theoretical guarantees for permutation-equivariant quantum neural networks. arXiv preprint arXiv:2210.09974, 2022.
- Provable tensor methods for learning mixtures of generalized linear models. In Artificial Intelligence and Statistics, pages 1223–1231. PMLR, 2016.
- The pitfalls of simplicity bias in neural networks. Advances in Neural Information Processing Systems, 33:9573–9585, 2020.
- Ohad Shamir. Distribution-specific hardness of learning neural networks. The Journal of Machine Learning Research, 19(1):1135–1163, 2018.
- Equivariant quantum circuits for learning on weighted graphs. npj Quantum Information, 9(1):47, 2023.
- Generalization error of invariant classifiers. In Artificial Intelligence and Statistics, pages 1094–1103. PMLR, 2017.
- On the complexity of learning neural networks. Advances in neural information processing systems, 30, 2017.
- On the cryptographic hardness of learning single periodic neurons. Advances in neural information processing systems, 34:29602–29615, 2021.
- Joel Spencer. Asymptopia, volume 71. American Mathematical Soc., 2014.
- Balázs Szörényi. Characterizing statistical query learning: simplified notions and proofs. In International Conference on Algorithmic Learning Theory, pages 186–200. Springer, 2009.
- The exact sample complexity gain from invariances for kernel regression on manifolds. arXiv preprint arXiv:2303.14269, 2023.
- Michel Talagrand. On Russo’s Approximate Zero-One Law. The Annals of Probability, 22(3):1576 – 1587, 1994. doi: 10.1214/aop/1176988612. URL https://doi.org/10.1214/aop/1176988612.
- Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
- Deep learning generalizes because the parameter-function map is biased towards simple functions. arXiv preprint arXiv:1805.08522, 2018.
- Gal Vardi. On the implicit bias in deep-learning algorithms. Communications of the ACM, 66(6):86–93, 2023.
- Gradient descent for one-hidden-layer neural networks: Polynomial convergence and sq lower bounds. In Conference on Learning Theory, pages 3115–3117. PMLR, 2019.
- Stability and generalization of graph convolutional neural networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1539–1548, 2019.
- So(2)-equivariant reinforcement learning. In International Conference on Learning Representations, 2021.
- A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
- A unifying view on implicit bias in training linear neural networks. arXiv preprint arXiv:2010.02501, 2020.
- Rethinking the expressive power of gnns via graph biconnectivity. arXiv preprint arXiv:2301.09505, 2023.
- Fast learning of graph neural networks with guaranteed generalizability: one-hidden-layer case. In International Conference on Machine Learning, pages 11268–11277. PMLR, 2020.
- On the learnability of fully-connected neural networks. In Artificial Intelligence and Statistics, pages 83–91. PMLR, 2017.
- Learning non-overlapping convolutional neural networks with multiple kernels. arXiv preprint arXiv:1711.03440, 2017.
- Understanding generalization and optimization performance of deep cnns. In International Conference on Machine Learning, pages 5960–5969. PMLR, 2018.