Towards Better Orthogonality Regularization with Disentangled Norm in Training Deep CNNs (2306.09939v1)
Abstract: Orthogonality regularization has been developed to prevent deep CNNs from training instability and feature redundancy. Among existing proposals, kernel orthogonality regularization enforces orthogonality by minimizing the residual between the Gram matrix formed by convolutional filters and the orthogonality matrix. We propose a novel measure for achieving better orthogonality among filters, which disentangles diagonal and correlation information from the residual. The model equipped with the measure under the principle of imposing strict orthogonality between filters surpasses previous regularization methods in near-orthogonality. Moreover, we observe the benefits of improved strict filter orthogonality in relatively shallow models, but as model depth increases, the performance gains in models employing strict kernel orthogonality decrease sharply. Furthermore, based on the observation of the potential conflict between strict kernel orthogonality and growing model capacity, we propose a relaxation theory on kernel orthogonality regularization. The relaxed kernel orthogonality achieves enhanced performance on models with increased capacity, shedding light on the burden of strict kernel orthogonality on deep model performance. We conduct extensive experiments with our kernel orthogonality regularization toolkit on ResNet and WideResNet in CIFAR-10 and CIFAR-100. We observe state-of-the-art gains in model performance from the toolkit, which includes both strict orthogonality and relaxed orthogonality regularization, and obtain more robust models with expressive features. These experiments demonstrate the efficacy of our toolkit and subtly provide insights into the often overlooked challenges posed by strict orthogonality, addressing the burden of strict orthogonality on capacity-rich models.
- Unitary evolution recurrent neural networks. In Proceedings of the 33nd International Conference on Machine LearningNew York City, NY, USA, pages 1120–1128. JMLR, 2016.
- Can we gain more from orthogonality regularizations in training deep networks? In Proceedings of the 31th International Conference on Neural Information Processing Systems, Montréal, Canada, pages 4266–4276. Curran Associates, 2018.
- Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994.
- Neural photo editing with introspective adversarial networks. In Proceedings of the 5nd International Conference on Learning Representations, Toulon, France, pages 1–1. OpenReview, 2017.
- Large scale GAN training for high fidelity natural image synthesis. In Proceedings of the 7nd International Conference on Learning Representations, New Orleans, LA, USA, pages 1–1. OpenReview, 2019.
- Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, USA, pages 3794–3803. PMLR, 2019.
- Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Quebec, Canada, pages 2933–2941. MIT Press, 2014.
- Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. ArXiv:2203.06717, pages 1–1, 2022.
- Dizzyrnn: Reparameterizing recurrent neural networks for norm-preserving backpropagation. ArXiv:1612.04035, pages 1–1, 2016.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 2010. PMLR.
- Generalized backpropagation, étude de cas: Orthogonality. ArXiv:1611.05927, pages 1–1, 2016.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, pages 1026–1034. IEEE Computer Society, 2015.
- Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pages 770–778. IEEE Computer Society, 2016.
- Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In Proceedings of the 32th AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, pages 3271–3278. AAAI Press, 2018.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, volume 37, pages 448–456. JMLR, 2015.
- Revisiting orthogonality regularization: A study for convolutional neural networks in image classification. IEEE Access, 10:69741–69749, 2022.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. pages 32–33, 2009.
- Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, pages 1106–1114. Curran Associates, 2012.
- Spectral normalization for generative adversarial networks. In Proceedings of the 6nd International Conference on Learning Representations, Vancouver, BC, Canada, pages 1–1. OpenReview, 2018.
- Optimization on submanifolds of convolution kernels in cnns. ArXiv:1610.07008, 2016.
- The intrinsic dimension of images and its impact on learning. In Proceedings of the 9nd International Conference on Learning Representations, Virtual Event, Austria, pages 1–1. OpenReview, 2021.
- Deep isometric learning for visual recognition. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, volume 119, pages 7824–7835. PMLR, 2020.
- Regularizing cnns with locally constrained decorrelations. In Proceedings of the 5nd International Conference on Learning Representations, Toulon, France, pages 1–1. OpenReview, 2017.
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In Proceedings of the 2nd International Conference on Learning Representations, Banff, Canada, pages 1–1. OpenReview, 2014.
- Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3nd International Conference on Learning Representations, San Diego, CA, USA, pages 1–1. OpenReview, 2015.
- Highway networks. ArXiv:1505.00387:1–1, 2015.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, USA, volume 97, pages 6105–6114. PMLR, 2019.
- On orthogonality and learning recurrent networks with long term dependencies. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, pages 3570–3578. PMLR, 2017.
- Orthogonal convolutional neural networks. In Proceedings of IEEE Conference on Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pages 11502–11512. IEEE, 2020.
- Wikipedia. Matrix norm — Wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php?title=Matrix%20norm&oldid=1131075808, 2023. [Online; accessed 08-January-2023].
- Full-capacity unitary recurrent neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pages 4880–4888. Curran Associates, 2016.
- All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In Proceedings of IEEE Conference on Computer Society Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pages 5075–5084. IEEE Computer Society, 2017.
- Spectral norm regularization for improving the generalizability of deep learning. ArXiv:1705.10941, pages 1–1, 2017.
- Changhao Wu (5 papers)
- Shenan Zhang (1 paper)
- Fangsong Long (1 paper)
- Ziliang Yin (1 paper)
- Tuo Leng (8 papers)