Exploring the Complexity of Deep Neural Networks through Functional Equivalence (2305.11417v3)
Abstract: We investigate the complexity of deep neural networks through the lens of functional equivalence, which posits that different parameterizations can yield the same network function. Leveraging the equivalence property, we present a novel bound on the covering number for deep neural networks, which reveals that the complexity of neural networks can be reduced. Additionally, we demonstrate that functional equivalence benefits optimization, as overparameterized networks tend to be easier to train since increasing network width leads to a diminishing volume of the effective parameter space. These findings can offer valuable insights into the phenomenon of overparameterization and have implications for understanding generalization and optimization in deep learning.
- Git re-basin: Merging models modulo permutation symmetries. arXiv preprint arXiv:2209.04836.
- A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning, pages 242–252. PMLR.
- Neural network learning: Theoretical foundations, volume 9. cambridge university press Cambridge.
- Symmetry-invariant optimization in deep networks. arXiv preprint arXiv:1511.01754.
- Understanding symmetries in deep networks. arXiv preprint arXiv:1511.01029.
- Spectrally-normalized margin bounds for neural networks. Advances in neural information processing systems, 30.
- Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks. The Journal of Machine Learning Research, 20(1):2285–2301.
- What size net gives valid generalization? Advances in neural information processing systems, 1.
- Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32):15849–15854.
- Parameter identifiability of a deep feedforward relu neural network. arXiv preprint arXiv:2112.12982.
- Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape. arXiv preprint arXiv:1907.02911.
- Functional vs. parametric equivalence of relu networks. In 8th International Conference on Learning Representations.
- On the geometry of feedforward neural network error surfaces. Neural computation, 5(6):910–927.
- Riemannian approach to batch normalization. Advances in Neural Information Processing Systems, 30.
- Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314.
- On minimal representations of shallow relu networks. Neural Networks, 148:121–128.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Gradient descent finds global minima of deep neural networks. In International conference on machine learning, pages 1675–1685. PMLR.
- Dudley, R. M. (1967). The sizes of compact subsets of hilbert space and continuity of gaussian processes. Journal of Functional Analysis, 1(3):290–330.
- Dudley, R. M. (2010). Universal donsker classes and metric entropy. In Selected Works of RM Dudley, pages 345–365. Springer.
- How degenerate is the parametrization of neural networks with the relu activation function? Advances in Neural Information Processing Systems, 32.
- The role of permutation invariance in linear mode connectivity of neural networks. arXiv preprint arXiv:2110.06296.
- Theory of deep convolutional neural networks ii: Spherical analysis. Neural Networks, 131:154–162.
- Recovering a feed-forward net from its output. Advances in neural information processing systems, 6.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings.
- Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323. JMLR Workshop and Conference Proceedings.
- Bounding the vapnik-chervonenkis dimension of concept classes parameterized by real numbers. In Proceedings of the sixth annual conference on Computational learning theory, pages 361–369.
- Size-independent sample complexity of neural networks. In Conference On Learning Theory, pages 297–299. PMLR.
- Local and global topological complexity measures of relu neural network functions. arXiv preprint arXiv:2204.06062.
- Functional dimension of feedforward relu neural networks. arXiv preprint arXiv:2209.04036.
- Characterizing implicit bias in terms of optimization geometry. In International Conference on Machine Learning, pages 1832–1841. PMLR.
- Implicit bias of gradient descent on linear convolutional networks. Advances in neural information processing systems, 31.
- Deep relu networks have surprisingly few activation patterns. Advances in neural information processing systems, 32.
- Haussler, D. (1995). Sphere packing numbers for subsets of the boolean n-cube with bounded vapnik-chervonenkis dimension. Journal of Combinatorial Theory, Series A, 69(2):217–232.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
- Hecht-Nielsen, R. (1990). On the algebraic structure of feedforward network weight spaces. In Advanced Neural Computers, pages 129–135. Elsevier.
- Towards lower bounds on the depth of relu neural networks. Advances in Neural Information Processing Systems, 34:3336–3348.
- Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257.
- Deep nonparametric regression on approximate manifolds: Nonasymptotic error bounds with polynomial prefactors. The Annals of Statistics, 51(2):691–716.
- Repair: Renormalizing permuted activations for interpolation repair. arXiv preprint arXiv:2211.08403.
- On the rate of convergence of fully connected deep neural network regression estimates. The Annals of Statistics, 49(4):2231–2249.
- Functionally equivalent feedforward neural networks. Neural Computation, 6(3):543–558.
- On tighter generalization bound for deep neural networks: Cnns, resnets, and beyond. arXiv preprint arXiv:1806.05159.
- Generalization bounds for convolutional neural networks. arXiv preprint arXiv:1910.01487.
- Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506.
- The expressive power of neural networks: A view from the width. Advances in neural information processing systems, 30.
- Theory of deep convolutional neural networks iii: Approximating radial functions. Neural Networks, 144:778–790.
- Expand-and-cluster: exact parameter recovery of neural networks. arXiv preprint arXiv:2304.12794.
- G-sgd: Optimizing relu neural networks in its positively scale-invariant space. In International Conference on Learning Representations.
- Foundations of machine learning. MIT press.
- Equivariant architectures for learning in deep weight spaces. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 25790–25816. PMLR.
- Neyshabur, B. (2017). Implicit regularization in deep learning. arXiv preprint arXiv:1709.01953.
- Exploring generalization in deep learning. Advances in neural information processing systems, 30.
- A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1707.09564.
- The role of over-parametrization in generalization of neural networks. In International Conference on Learning Representations.
- Path-sgd: Path-normalized optimization in deep neural networks. Advances in neural information processing systems, 28.
- In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614.
- Norm-based capacity control in neural networks. In Conference on learning theory, pages 1376–1401. PMLR.
- Sensitivity and generalization in neural networks: an empirical study. In International Conference on Learning Representations.
- A function space view of bounded norm infinite width relu nets: The multivariate case. arXiv preprint arXiv:1910.01635.
- Topological properties of the set of functions generated by neural networks of fixed size. Foundations of computational mathematics, 21:375–444.
- Optimal approximation of piecewise smooth functions using deep relu neural networks. Neural Networks, 108:296–330.
- Notes on the symmetries of 2-layer relu-networks. In Proceedings of the northern lights deep learning workshop, volume 1, pages 6–6.
- Improving language understanding by generative pre-training.
- Implicit regularization in deep learning may not be explainable by norms. Advances in neural information processing systems, 33:21174–21187.
- On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237.
- Reverse-engineering deep relu networks. In International Conference on Machine Learning, pages 8178–8187. PMLR.
- Understanding and improving convolutional neural networks via concatenated rectified linear units. In international conference on machine learning, pages 2217–2225. PMLR.
- Approximation with cnns in sobolev space: with applications to classification. Advances in Neural Information Processing Systems, 35:2876–2888.
- Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances. In International Conference on Machine Learning, pages 9722–9732. PMLR.
- Equi-normalization of neural networks. In International Conference on Learning Representations.
- An embedding of relu networks and an analysis of their identifiability. Constructive Approximation, pages 1–47.
- Sun, R.-Y. (2020). Optimization for deep learning: An overview. Journal of the Operations Research Society of China, 8(2):249–294.
- Sussmann, H. J. (1992). Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural networks, 5(4):589–593.
- Attention is all you need. Advances in neural information processing systems, 30.
- Yarotsky, D. (2017). Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114.
- Yarotsky, D. (2018). Optimal approximation of continuous functions by very deep relu networks. In Conference on Learning Theory, pages 639–649. PMLR.
- Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations.
- Zhou, D.-X. (2020a). Theory of deep convolutional neural networks: Downsampling. Neural Networks, 124:319–327.
- Zhou, D.-X. (2020b). Universality of deep convolutional neural networks. Appl. Comput. Harmon. Anal., 48(2):787–794.