Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Kernel Normalized Convolutional Networks (2205.10089v4)

Published 20 May 2022 in cs.LG and cs.CV

Abstract: Existing convolutional neural network architectures frequently rely upon batch normalization (BatchNorm) to effectively train the model. BatchNorm, however, performs poorly with small batch sizes, and is inapplicable to differential privacy. To address these limitations, we propose the kernel normalization (KernelNorm) and kernel normalized convolutional layers, and incorporate them into kernel normalized convolutional networks (KNConvNets) as the main building blocks. We implement KNConvNets corresponding to the state-of-the-art ResNets while forgoing the BatchNorm layers. Through extensive experiments, we illustrate that KNConvNets achieve higher or competitive performance compared to the BatchNorm counterparts in image classification and semantic segmentation. They also significantly outperform their batch-independent competitors including those based on layer and group normalization in non-private and differentially private training. Given that, KernelNorm combines the batch-independence property of layer and group normalization with the performance advantage of BatchNorm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp.  308–318, 2016.
  2. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  3. Can we gain more from orthogonality regularizations in training deep networks? Advances in Neural Information Processing Systems, 31, 2018.
  4. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.
  5. AB Bonds. Role of inhibition in the specification of orientation selectivity of cells in the cat striate cortex. Visual neuroscience, 2(1):41–55, 1989.
  6. Characterizing signal propagation to close the performance gap in unnormalized resnets. arXiv preprint arXiv:2101.08692, 2021a.
  7. High-performance large-scale image recognition without normalization. In International Conference on Machine Learning, pp.  1059–1071. PMLR, 2021b.
  8. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  3213–3223, 2016.
  9. D2L. Batch normalization. https://d2l.ai/chapter_convolutional-modern/batch-norm.html, 2023.
  10. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  12. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9:211–407, 2014.
  13. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.  249–256. JMLR Workshop and Conference Proceedings, 2010.
  14. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016a.
  15. Identity mappings in deep residual networks. In European conference on computer vision, pp.  630–645. Springer, 2016b.
  16. David J Heeger. Normalization of cell responses in cat striate cortex. Visual neuroscience, 9(2):181–197, 1992.
  17. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4700–4708, 2017a.
  18. Centered weight normalization in accelerating training of deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pp.  2803–2811, 2017b.
  19. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp.  448–456. PMLR, 2015.
  20. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2016.
  21. Learning multiple layers of features from tiny images. 2009.
  22. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  23. Liu Kuang. Pytorch models for ciafr-10/100. https://github.com/kuangliu/pytorch-cifar/, 2021.
  24. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
  25. Positional normalization. Advances in Neural Information Processing Systems, 32, 2019.
  26. Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31, 2018a.
  27. Visualizing the loss landscape of neural nets. https://github.com/tomgoldstein/loss-landscape, 2018b.
  28. Convolutional normalization: Improving deep convolutional network robustness and training. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021.
  29. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  11976–11986, 2022.
  30. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  3431–3440, 2015a.
  31. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  3431–3440, 2015b.
  32. SGDR: Stochastic gradient descent with warm restarts. In International Conference on Learning Representations, 2017.
  33. Understanding the generalization benefit of normalization layers: Sharpness reduction. Advances in Neural Information Processing Systems, 35:34689–34708, 2022.
  34. Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pp.  263–275. IEEE, 2017.
  35. Diganta Misra. Mish: A self regularized non-monotonic activation function. arXiv preprint arXiv:1908.08681, 2019.
  36. Local context normalization: Revisiting local normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11276–11285, 2020.
  37. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp.  8024–8035. Curran Associates, Inc., 2019.
  38. PyTorch. Batch normalization. https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html, 2023a.
  39. PyTorch. Unfold operation in pytorch. https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html, 2023b.
  40. PyTorch. var_mean function in pytorch. https://pytorch.org/docs/stable/generated/torch.var_mean.html, 2023c.
  41. Deep isometric learning for visual recognition. In International conference on machine learning, pp.  7824–7835. PMLR, 2020.
  42. Micro-batch training with batch-channel normalization and weight standardization. arXiv preprint arXiv:1903.10520, 2019.
  43. Normalizing the normalizers: Comparing and extending network normalization schemes. In International Conference on Learning Representations, 2017.
  44. Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pp.  901–909, 2016.
  45. How does batch normalization help optimization? Advances in neural information processing systems, 31, 2018.
  46. Overfeat: Integrated recognition, localization and detection using convolutional networks. In 2nd International Conference on Learning Representations, ICLR 2014, 2014.
  47. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  48. High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514, 2019.
  49. TorchVision. Classification training script in pytorch. https://github.com/pytorch/vision/tree/main/references/classification#resnet, 2023a.
  50. TorchVision. Classification training script in pytorch. https://github.com/pytorch/vision/tree/main/references/classification#convnext, 2023b.
  51. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
  52. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  53. Orthogonal convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  11505–11515, 2020.
  54. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp.  3–19, 2018.
  55. Opacus: User-friendly differential privacy library in PyTorch. arXiv preprint arXiv:2109.12298, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.