Implicit Regularization via Spectral Neural Networks and Non-linear Matrix Sensing (2402.17595v1)
Abstract: The phenomenon of implicit regularization has attracted interest in recent years as a fundamental aspect of the remarkable generalizing ability of neural networks. In a nutshell, it entails that gradient descent dynamics in many neural nets, even without any explicit regularizer in the loss function, converges to the solution of a regularized learning problem. However, known results attempting to theoretically explain this phenomenon focus overwhelmingly on the setting of linear neural nets, and the simplicity of the linear structure is particularly crucial to existing arguments. In this paper, we explore this problem in the context of more realistic neural networks with a general class of non-linear activation functions, and rigorously demonstrate the implicit regularization phenomenon for such networks in the setting of matrix sensing problems, together with rigorous rate guarantees that ensure exponentially fast convergence of gradient descent.In this vein, we contribute a network architecture called Spectral Neural Networks (abbrv. SNN) that is particularly suitable for matrix learning problems. Conceptually, this entails coordinatizing the space of matrices by their singular values and singular vectors, as opposed to by their entries, a potentially fruitful perspective for matrix learning. We demonstrate that the SNN architecture is inherently much more amenable to theoretical analysis than vanilla neural nets and confirm its effectiveness in the context of matrix sensing, via both mathematical guarantees and empirical investigations. We believe that the SNN architecture has the potential to be of wide applicability in a broad class of matrix learning scenarios.
- Implicit regularization in deep matrix factorization. Advances in Neural Information Processing Systems, 32, 2019.
- Global optimality of local search for low rank matrix recovery. Advances in Neural Information Processing Systems, 29, 2016.
- Implicit regularization for deep neural networks driven by an ornstein-uhlenbeck like process. In Conference on learning theory, pages 483–513. PMLR, 2020.
- A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95(2):329–357, 2003.
- Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009.
- The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
- Width provably matters in optimization for deep linear neural networks. In International Conference on Machine Learning, pages 1655–1664. PMLR, 2019.
- Matrix completion has no spurious local minimum. Advances in neural information processing systems, 29, 2016.
- Implicit regularization of discrete gradient dynamics in linear neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- Implicit regularization in matrix factorization. Advances in Neural Information Processing Systems, 30, 2017.
- Matrix completion from a few entries. IEEE transactions on information theory, 56(6):2980–2998, 2010.
- Implicit regularization in over-parameterized neural networks. arXiv preprint arXiv:1903.01997, 2019.
- Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In Conference On Learning Theory, pages 2–47. PMLR, 2018.
- Exploring generalization in deep learning. Advances in neural information processing systems, 30, 2017.
- In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614, 2014.
- Implicit regularization in deep learning may not be explainable by norms. Advances in neural information processing systems, 33:21174–21187, 2020.
- Implicit regularization in tensor factorization. In International Conference on Machine Learning, pages 8913–8924. PMLR, 2021.
- Benjamin Recht. A simpler approach to matrix completion. Journal of Machine Learning Research, 12(12), 2011.
- Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471–501, 2010.
- A mathematical theory of semantic development in deep neural networks. Proceedings of the National Academy of Sciences, 116(23):11537–11546, 2019.
- The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822–2878, 2018.
- Rank, trace-norm and max-norm. In International Conference on Computational Learning Theory, pages 545–560. Springer, 2005.
- Understanding the dynamics of gradient flow in overparameterized linear models. In International Conference on Machine Learning, pages 10153–10161. PMLR, 2021.
- Implicit regularization via hadamard product over-parametrization in high-dimensional linear regression. arXiv preprint arXiv:1903.09367, 2019.