Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks (2303.00196v3)
Abstract: Achieving efficient and robust multi-channel data learning is a challenging task in data science. By exploiting low-rankness in the transformed domain, i.e., transformed low-rankness, tensor Singular Value Decomposition (t-SVD) has achieved extensive success in multi-channel data representation and has recently been extended to function representation such as Neural Networks with t-product layers (t-NNs). However, it still remains unclear how t-SVD theoretically affects the learning behavior of t-NNs. This paper is the first to answer this question by deriving the upper bounds of the generalization error of both standard and adversarially trained t-NNs. It reveals that the t-NNs compressed by exact transformed low-rank parameterization can achieve a sharper adversarial generalization bound. In practice, although t-NNs rarely have exactly transformed low-rank weights, our analysis further shows that by adversarial training with gradient flow (GF), the over-parameterized t-NNs with ReLU activations are trained with implicit regularization towards transformed low-rank parameterization under certain conditions. We also establish adversarial generalization bounds for t-NNs with approximately transformed low-rank weights. Our analysis indicates that the transformed low-rank parameterization can promisingly enhance robust generalization for t-NNs.
- Dginet: Dynamic graph and interaction-aware convolutional network for vehicle trajectory prediction. Neural Networks, 151:336–348, 2022.
- Stronger generalization bounds for deep nets via a compression approach. In International Conference on Machine Learning, pages 254–263. PMLR, 2018.
- Adversarial learning guarantees for linear hypotheses and neural networks. In International Conference on Machine Learning, pages 431–441. PMLR, 2020.
- Localized rademacher complexities. In Computational Learning Theory: 15th Annual Conference on Computational Learning Theory, COLT 2002 Sydney, Australia, July 8–10, 2002 Proceedings 15, pages 44–58. Springer, 2002.
- Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
- The search for sparse, robust neural networks. arXiv preprint arXiv:1912.02386, 2019.
- Approximate KKT points and a proximity measure for termination. Journal of Global Optimization, 56(4):1463–1499, 2013.
- Size-independent sample complexity of neural networks. In Conference On Learning Theory, pages 297–299. PMLR, 2018.
- Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
- On measuring excess capacity in neural networks. arXiv preprint arXiv:2202.08070, 2022.
- T-ADAF: Adaptive data augmentation framework for image classification network based on tensor t-product operator. Neural Processing Letters, 2023.
- Generating and managing deep tensor neural networks, Dec. 20 2022. US Patent 11,531,902.
- Robust low-tubal-rank tensor recovery from binary measurements. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Mr-gcn: Multi-relational graph convolutional networks based on generalized tensor product. In IJCAI, volume 20, pages 1258–1264, 2020.
- Y. Idelbayev and M. A. Carreira-Perpinán. Low-rank compression of neural nets: Learning the rank of each layer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8049–8059, 2020.
- Z. Ji and M. Telgarsky. Directional convergence and alignment in deep learning. Advances in Neural Information Processing Systems, 33:17176–17186, 2020.
- Trace-norm adversarial examples. arXiv preprint arXiv:2007.01855, 2020.
- Tensor–tensor products with invertible linear transforms. Linear Algebra and its Applications, 485:545–570, 2015.
- Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM J MATRIX ANAL A, 34(1):148–172, 2013.
- Tensor-tensor algebra for optimal representation and compression of multiway data. Proceedings of the National Academy of Sciences, 118(28):e2015851118, 2021.
- Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.
- Understanding generalization in deep learning via tensor methods. In International Conference on Artificial Intelligence and Statistics, pages 504–515. PMLR, 2020.
- Achieving adversarial robustness via sparsity. Machine Learning, pages 1–27, 2022.
- Low-tubal-rank tensor completion using alternating minimization. IEEE TIT, 66(3):1714–1737, 2020.
- PAC-Bayes compression bounds so tight that they can explain generalization. Advances in Neural Information Processing Systems, 35:31459–31473, 2022.
- C. Lu. Transforms based tensor robust pca: Corrupted low-rank tensors recovery via convex optimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1145–1152, 2021.
- Tensor robust principal component analysis with a new tensor nuclear norm. IEEE TPAMI, 2019.
- Low-rank tensor completion with a new tensor nuclear norm induced by invertible linear transforms. In CVPR, pages 5996–6004, 2019.
- B. Lv and Z. Zhu. Implicit bias of adversarial training for deep neural networks. In International Conference on Learning Representations, 2022.
- K. Lyu and J. Li. Gradient descent maximizes the margin of homogeneous neural networks. In International Conference on Learning Representations, 2020.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Dynamic graph convolutional networks using the tensor M𝑀Mitalic_M-product. In Proceedings of the 2021 SIAM international conference on data mining (SDM), pages 729–737. SIAM, 2021.
- O. L. Mangasarian and S. Fromovitz. The fritz john necessary optimality conditions in the presence of equality and inequality constraints. Journal of Mathematical Analysis and applications, 17(1):37–47, 1967.
- Adversarial training methods for semi-supervised text classification. In International Conference on Learning Representations, 2017.
- A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Proceedings of Advances in Neural Information Processing Systems, pages 1348–1356, 2009.
- Stable tensor neural networks for rapid deep learning. arXiv preprint arXiv:1811.06569, 2018.
- Norm-based capacity control in neural networks. In Conference on Learning Theory, pages 1376–1401. PMLR, 2015.
- Dtae: Deep tensor autoencoder for 3-D seismic data interpolation. IEEE Transactions on Geoscience and Remote Sensing, 60:1–19, 2021.
- Ground truth-free 3-D seismic random noise attenuation via deep tensor convolutional neural networks in the time-frequency domain. IEEE Transactions on Geoscience and Remote Sensing, 60:1–17, 2022.
- Fast and provable nonconvex tensor RPCA. In International Conference on Machine Learning, pages 18211–18249. PMLR, 2022.
- Robust low-rank training via approximate orthonormal constraints. arXiv preprint arXiv:2306.01485, 2023.
- I. Steinwart and A. Christmann. Support vector machines. Springer Science & Business Media, 2008.
- Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network. In International Conference on Learning Representations, 2020.
- M. Talagrand. A new look at independence. The Annals of Probability, pages 1–34, 1996.
- Implicit regularization towards rank minimization in ReLU networks. In The 34th International Conference on Algorithmic Learning Theory, 2023.
- G. Vardi and O. Shamir. Implicit regularization in ReLU networks with the square loss. In Conference on Learning Theory, pages 4224–4258. PMLR, 2021.
- R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- Robust tensor decomposition via t-SVD: Near-optimal statistical guarantee and scalable algorithms. Signal Processing, 167:107319, 2020.
- Robust tensor decomposition via orientation invariant tubal nuclear norms. In AAAI, pages 6102–6109, 2020.
- Tensor recovery via *Lsubscript𝐿*_{L}* start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT-spectral k𝑘kitalic_k-support norm. IEEE Journal of Selected Topics in Signal Processing, 15(3):522–534, 2021.
- Adversarial robustness of pruned neural networks. 2018.
- Low-rank tensor completion by approximating the tensor average rank. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4612–4620, 2021.
- Robust tensor graph convolutional networks via t-SVD based graph augmentation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2090–2099, 2022.
- D. Xia and M. Yuan. On polynomial time methods for exact low rank tensor completion. arXiv preprint arXiv:1702.06980, 2017.
- Adversarial Rademacher complexity of deep neural networks. arXiv preprint arXiv:2211.14966, 2022.
- Stability analysis and generalization bounds of adversarial training. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
- PAC-bayesian spectrally-normalized bounds for adversarially robust generalization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Toward interpretable graph tensor convolution neural network for code semantics embedding. ACM Transactions on Software Engineering and Methodology, 2023.
- Rademacher complexity for adversarially robust generalization. In International Conference on Machine Learning, pages 7085–7094. PMLR, 2019.
- X. Zhang and M. K. Ng. Sparse nonnegative tensor factorization and completion with noisy observations. IEEE Transactions on Information Theory, 68(4):2551–2572, 2022.
- X. Zhang and M. K.-P. Ng. Low rank tensor completion with poisson observations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.