Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks (2303.00196v3)

Published 1 Mar 2023 in cs.LG and cs.AI

Abstract: Achieving efficient and robust multi-channel data learning is a challenging task in data science. By exploiting low-rankness in the transformed domain, i.e., transformed low-rankness, tensor Singular Value Decomposition (t-SVD) has achieved extensive success in multi-channel data representation and has recently been extended to function representation such as Neural Networks with t-product layers (t-NNs). However, it still remains unclear how t-SVD theoretically affects the learning behavior of t-NNs. This paper is the first to answer this question by deriving the upper bounds of the generalization error of both standard and adversarially trained t-NNs. It reveals that the t-NNs compressed by exact transformed low-rank parameterization can achieve a sharper adversarial generalization bound. In practice, although t-NNs rarely have exactly transformed low-rank weights, our analysis further shows that by adversarial training with gradient flow (GF), the over-parameterized t-NNs with ReLU activations are trained with implicit regularization towards transformed low-rank parameterization under certain conditions. We also establish adversarial generalization bounds for t-NNs with approximately transformed low-rank weights. Our analysis indicates that the transformed low-rank parameterization can promisingly enhance robust generalization for t-NNs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Dginet: Dynamic graph and interaction-aware convolutional network for vehicle trajectory prediction. Neural Networks, 151:336–348, 2022.
  2. Stronger generalization bounds for deep nets via a compression approach. In International Conference on Machine Learning, pages 254–263. PMLR, 2018.
  3. Adversarial learning guarantees for linear hypotheses and neural networks. In International Conference on Machine Learning, pages 431–441. PMLR, 2020.
  4. Localized rademacher complexities. In Computational Learning Theory: 15th Annual Conference on Computational Learning Theory, COLT 2002 Sydney, Australia, July 8–10, 2002 Proceedings 15, pages 44–58. Springer, 2002.
  5. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
  6. The search for sparse, robust neural networks. arXiv preprint arXiv:1912.02386, 2019.
  7. Approximate KKT points and a proximity measure for termination. Journal of Global Optimization, 56(4):1463–1499, 2013.
  8. Size-independent sample complexity of neural networks. In Conference On Learning Theory, pages 297–299. PMLR, 2018.
  9. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
  10. On measuring excess capacity in neural networks. arXiv preprint arXiv:2202.08070, 2022.
  11. T-ADAF: Adaptive data augmentation framework for image classification network based on tensor t-product operator. Neural Processing Letters, 2023.
  12. Generating and managing deep tensor neural networks, Dec. 20 2022. US Patent 11,531,902.
  13. Robust low-tubal-rank tensor recovery from binary measurements. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  14. Mr-gcn: Multi-relational graph convolutional networks based on generalized tensor product. In IJCAI, volume 20, pages 1258–1264, 2020.
  15. Y. Idelbayev and M. A. Carreira-Perpinán. Low-rank compression of neural nets: Learning the rank of each layer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8049–8059, 2020.
  16. Z. Ji and M. Telgarsky. Directional convergence and alignment in deep learning. Advances in Neural Information Processing Systems, 33:17176–17186, 2020.
  17. Trace-norm adversarial examples. arXiv preprint arXiv:2007.01855, 2020.
  18. Tensor–tensor products with invertible linear transforms. Linear Algebra and its Applications, 485:545–570, 2015.
  19. Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM J MATRIX ANAL A, 34(1):148–172, 2013.
  20. Tensor-tensor algebra for optimal representation and compression of multiway data. Proceedings of the National Academy of Sciences, 118(28):e2015851118, 2021.
  21. Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.
  22. Understanding generalization in deep learning via tensor methods. In International Conference on Artificial Intelligence and Statistics, pages 504–515. PMLR, 2020.
  23. Achieving adversarial robustness via sparsity. Machine Learning, pages 1–27, 2022.
  24. Low-tubal-rank tensor completion using alternating minimization. IEEE TIT, 66(3):1714–1737, 2020.
  25. PAC-Bayes compression bounds so tight that they can explain generalization. Advances in Neural Information Processing Systems, 35:31459–31473, 2022.
  26. C. Lu. Transforms based tensor robust pca: Corrupted low-rank tensors recovery via convex optimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1145–1152, 2021.
  27. Tensor robust principal component analysis with a new tensor nuclear norm. IEEE TPAMI, 2019.
  28. Low-rank tensor completion with a new tensor nuclear norm induced by invertible linear transforms. In CVPR, pages 5996–6004, 2019.
  29. B. Lv and Z. Zhu. Implicit bias of adversarial training for deep neural networks. In International Conference on Learning Representations, 2022.
  30. K. Lyu and J. Li. Gradient descent maximizes the margin of homogeneous neural networks. In International Conference on Learning Representations, 2020.
  31. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
  32. Dynamic graph convolutional networks using the tensor M𝑀Mitalic_M-product. In Proceedings of the 2021 SIAM international conference on data mining (SDM), pages 729–737. SIAM, 2021.
  33. O. L. Mangasarian and S. Fromovitz. The fritz john necessary optimality conditions in the presence of equality and inequality constraints. Journal of Mathematical Analysis and applications, 17(1):37–47, 1967.
  34. Adversarial training methods for semi-supervised text classification. In International Conference on Learning Representations, 2017.
  35. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Proceedings of Advances in Neural Information Processing Systems, pages 1348–1356, 2009.
  36. Stable tensor neural networks for rapid deep learning. arXiv preprint arXiv:1811.06569, 2018.
  37. Norm-based capacity control in neural networks. In Conference on Learning Theory, pages 1376–1401. PMLR, 2015.
  38. Dtae: Deep tensor autoencoder for 3-D seismic data interpolation. IEEE Transactions on Geoscience and Remote Sensing, 60:1–19, 2021.
  39. Ground truth-free 3-D seismic random noise attenuation via deep tensor convolutional neural networks in the time-frequency domain. IEEE Transactions on Geoscience and Remote Sensing, 60:1–17, 2022.
  40. Fast and provable nonconvex tensor RPCA. In International Conference on Machine Learning, pages 18211–18249. PMLR, 2022.
  41. Robust low-rank training via approximate orthonormal constraints. arXiv preprint arXiv:2306.01485, 2023.
  42. I. Steinwart and A. Christmann. Support vector machines. Springer Science & Business Media, 2008.
  43. Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network. In International Conference on Learning Representations, 2020.
  44. M. Talagrand. A new look at independence. The Annals of Probability, pages 1–34, 1996.
  45. Implicit regularization towards rank minimization in ReLU networks. In The 34th International Conference on Algorithmic Learning Theory, 2023.
  46. G. Vardi and O. Shamir. Implicit regularization in ReLU networks with the square loss. In Conference on Learning Theory, pages 4224–4258. PMLR, 2021.
  47. R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  48. Robust tensor decomposition via t-SVD: Near-optimal statistical guarantee and scalable algorithms. Signal Processing, 167:107319, 2020.
  49. Robust tensor decomposition via orientation invariant tubal nuclear norms. In AAAI, pages 6102–6109, 2020.
  50. Tensor recovery via *Lsubscript𝐿*_{L}* start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT-spectral k𝑘kitalic_k-support norm. IEEE Journal of Selected Topics in Signal Processing, 15(3):522–534, 2021.
  51. Adversarial robustness of pruned neural networks. 2018.
  52. Low-rank tensor completion by approximating the tensor average rank. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4612–4620, 2021.
  53. Robust tensor graph convolutional networks via t-SVD based graph augmentation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2090–2099, 2022.
  54. D. Xia and M. Yuan. On polynomial time methods for exact low rank tensor completion. arXiv preprint arXiv:1702.06980, 2017.
  55. Adversarial Rademacher complexity of deep neural networks. arXiv preprint arXiv:2211.14966, 2022.
  56. Stability analysis and generalization bounds of adversarial training. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
  57. PAC-bayesian spectrally-normalized bounds for adversarially robust generalization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  58. Toward interpretable graph tensor convolution neural network for code semantics embedding. ACM Transactions on Software Engineering and Methodology, 2023.
  59. Rademacher complexity for adversarially robust generalization. In International Conference on Machine Learning, pages 7085–7094. PMLR, 2019.
  60. X. Zhang and M. K. Ng. Sparse nonnegative tensor factorization and completion with noisy observations. IEEE Transactions on Information Theory, 68(4):2551–2572, 2022.
  61. X. Zhang and M. K.-P. Ng. Low rank tensor completion with poisson observations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
Citations (4)

Summary

We haven't generated a summary for this paper yet.