Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spectral Norm of Convolutional Layers with Circular and Zero Paddings (2402.00240v1)

Published 31 Jan 2024 in cs.LG and cs.CV

Abstract: This paper leverages the use of \emph{Gram iteration} an efficient, deterministic, and differentiable method for computing spectral norm with an upper bound guarantee. Designed for circular convolutional layers, we generalize the use of the Gram iteration to zero padding convolutional layers and prove its quadratic convergence. We also provide theorems for bridging the gap between circular and zero padding convolution's spectral norm. We design a \emph{spectral rescaling} that can be used as a competitive $1$-Lipschitz layer that enhances network robustness. Demonstrated through experiments, our method outperforms state-of-the-art techniques in precision, computational cost, and scalability. The code of experiments is available at https://github.com/blaisedelattre/lip4conv.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, vol. 25, 2012.
  3. M. Tan and Q. Le, “EfficientNetV2: Smaller Models and Faster Training,” in International Conference on Machine Learning, 2021, pp. 10 096–10 106.
  4. S. Mallat, “Understanding deep convolutional networks,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, p. 20150203, 2016.
  5. Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: Analysis, applications, and prospects,” IEEE Trans Neural Netw Learn Syst, vol. 33, no. 12, pp. 6999–7019, 2022.
  6. A. Conneau, H. Schwenk, Y. Cun, and L. Barrault, “Very deep convolutional networks for text classification,” in Long Papers - Continued, ser. Conference of the European Chapter of the Association for Computational Linguistics, 2017, pp. 1107–1116.
  7. A. Gu, K. Goel, and C. Re, “Efficiently modeling long sequences with structured state spaces,” arXiv:2111.00396v3, 2022.
  8. A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” 2020.
  9. S. Zhang, H. Tong, J. Xu, and R. Maciejewski, “Graph convolutional networks: a comprehensive review,” Computational Social Networks, vol. 6, 11 2019.
  10. H. Pinson, J. Lenaerts, and V. Ginis, “Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies,” in International Conference on Machine Learning, 2023, pp. 27 876–27 906.
  11. S. P. Singh, T. Hofmann, and B. Schölkopf, “The Hessian perspective into the Nature of Convolutional Neural Networks,” in International Conference on Machine Learning, 2023, pp. 31 930–31 968.
  12. L. Tang, W. Shen, Z. Zhou, Y. Chen, and Q. Zhang, “Defects of Convolutional Decoder Networks in Frequency Representation,” in International Conference on Machine Learning, 2023, pp. 33 758–33 791.
  13. P. Gavrikov and J. Keuper, “On the Interplay of Convolutional Padding and Adversarial Robustness,” arXiv:2308.06612, 2023.
  14. M. A. Islam, S. Jia, and N. D. B. Bruce, “How much Position Information Do Convolutional Neural Networks Encode?” in International Conference on Learning Representations, 2019.
  15. O. Semih Kayhan and J. C. Van Gemert, “On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location,” in Computer Vision and Pattern Recognition, 2020, pp. 14 262–14 273.
  16. R. Zhang, “Making Convolutional Networks Shift-Invariant Again,” in International Conference on Machine Learning, 2019, pp. 7324–7334.
  17. P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, “Spectrally-normalized margin bounds for neural networks,” in Advances in Neural Information Processing Systems, vol. 30, 2017.
  18. T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in International Conference on Learning Representations, 2018.
  19. A. Virmaux and K. Scaman, “Lipschitz regularity of deep neural networks: analysis and efficient estimation,” in Advances in Neural Information Processing Systems, 2018.
  20. M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier, “Parseval Networks: Improving Robustness to Adversarial Examples,” in International Conference on Machine Learning, 2017.
  21. Y. Tsuzuku, I. Sato, and M. Sugiyama, “Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks,” in Advances in Neural Information Processing Systems, 2018.
  22. B. Delattre, Q. Barthélemy, A. Araujo, and A. Allauzen, “Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration,” in International Conference on Machine Learning, 2023.
  23. G. H. Golub et al., “Eigenvalue computation in the 20th century,” Journal of Computational and Applied Mathematics, 2000.
  24. W. Wang, Z. Dang, Y. Hu, P. Fua, and M. Salzmann, “Robust Differentiable SVD,” IEEE Trans Pattern Anal Mach Intell, vol. 44, pp. 5472–5487, 2022.
  25. H. Sedghi, V. Gupta, and P. Long, “The singular values of convolutional layers,” in International Conference on Learning Representations, 2019.
  26. J. Wang, Y. Chen, R. Chakraborty, and S. X. Yu, “Orthogonal convolutional neural networks,” in Conference on Computer Vision and Pattern Recognition, 2020.
  27. E. M. Achour, F. Malgouyres, and F. Mamalet, “Existence, stability and scalability of orthogonal convolutional neural networks,” Journal of Machine Learning Research, vol. 23, no. 347, pp. 1–56, 2022.
  28. F. Farnia, J. Zhang, and D. Tse, “Generalizable adversarial training via spectral normalization,” in International Conference on Learning Representations, 2019.
  29. A. Araujo, B. Negrevergne, Y. Chevaleyre, and J. Atif, “On lipschitz regularization of convolutional layers using toeplitz matrix theory,” AAAI Conference on Artificial Intelligence, 2021.
  30. X. Yi, “Asymptotic spectral representation of linear convolutional layers,” IEEE Trans Signal Process, vol. 70, pp. 566–581, 2022.
  31. S. Singla et al., “Fantastic four: Differentiable and efficient bounds on singular values of convolution layers,” in International Conference on Learning Representations, 2021.
  32. Q. Li, S. Haque, C. Anil, J. Lucas, R. B. Grosse, and J.-H. Jacobsen, “Preventing gradient attenuation in lipschitz constrained convolutional networks,” in Advances in Neural Information Processing Systems, 2019.
  33. A. Trockman et al., “Orthogonalizing convolutional layers with the cayley transform,” in International Conference on Learning Representations, 2021.
  34. S. Singla and S. Feizi, “Skew orthogonal convolutions,” in International Conference on Machine Learning, 2021.
  35. B. Prach and C. H. Lampert, “Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks,” in European Conference on Computer Vision, 2022.
  36. L. Meunier, B. J. Delattre, A. Araujo, and A. Allauzen, “A Dynamical System Perspective for Lipschitz Neural Networks,” in International Conference on Machine Learning, 2022, pp. 15 484–15 500.
  37. A. Araujo, A. J. Havens, B. Delattre, A. Allauzen, and B. Hu, “A Unified Algebraic Perspective on Lipschitz Neural Networks,” in International Conference on Learning Representations, 2023.
  38. M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of lipschitz constants for deep neural networks,” in Advances in Neural Information Processing Systems, 2019.
  39. A. K. Jain, “Fundamentals of digital image processing,” in Fundamentals of digital image processing.   Englewood Cliffs, NJ: Prentice Hall, 1989.
  40. H. V. Henderson and S. R. Searle, “The vec-permutation matrix, the vec operator and kronecker products: A review,” Linear & Multilinear Algebra, vol. 9, pp. 271–288, 1981.
  41. S. Friedland, “Revisiting matrix squaring,” Linear algebra and its applications, vol. 154, pp. 59–63, 1991.
  42. Y. Song, N. Sebe, and W. Wang, “Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?” arXiv:2105.02498, 2021.
  43. L. Pfister and Y. Bresler, “Bounding multivariate trigonometric polynomials,” IEEE Trans Signal Process, vol. 67, pp. 700–707, 2019.
  44. E. K. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, “Plug-and-play methods provably converge with properly trained denoisers,” in International Conference on Machine Learning, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.