Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ghost Noise for Regularizing Deep Neural Networks (2305.17205v2)

Published 26 May 2023 in cs.LG

Abstract: Batch Normalization (BN) is widely used to stabilize the optimization process and improve the test performance of deep neural networks. The regularization effect of BN depends on the batch size and explicitly using smaller batch sizes with Batch Normalization, a method known as Ghost Batch Normalization (GBN), has been found to improve generalization in many settings. We investigate the effectiveness of GBN by disentangling the induced ``Ghost Noise'' from normalization and quantitatively analyzing the distribution of noise as well as its impact on model performance. Inspired by our analysis, we propose a new regularization technique called Ghost Noise Injection (GNI) that imitates the noise in GBN without incurring the detrimental train-test discrepancy effects of small batch training. We experimentally show that GNI can provide a greater generalization benefit than GBN. Ghost Noise Injection can also be beneficial in otherwise non-noisy settings such as layer-normalized networks, providing additional evidence of the usefulness of Ghost Noise in Batch Normalization as a regularizer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Layer normalization. arXiv preprint arXiv:1607.06450.
  2. Better plain ViT baselines for ImageNet-1k. arXiv preprint arXiv:2205.01580.
  3. High-performance large-scale image recognition without normalization. In International Conference on Machine Learning, 1059–1071. PMLR.
  4. Explicit regularisation in gaussian noise injections. Advances in Neural Information Processing Systems, 33: 16603–16614.
  5. Online normalization for training neural networks. Advances in Neural Information Processing Systems, 32. ArXiv:1905.05894.
  6. On the Relationship between Self-Attention and Convolutional Layers. In International Conference on Learning Representations.
  7. Deng, L. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6): 141–142.
  8. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  10. Locality-Aware Channel-Wise Dropout for Occluded Face Recognition. IEEE Transactions on Image Processing, 31: 788–798.
  11. Train longer, generalize better: closing the generalization gap in large batch training of neural networks. Advances in neural information processing systems, 30.
  12. Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press. ISBN 978-1-57735-809-1.
  13. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, 1501–1510.
  14. Ioffe, S. 2017. Batch renormalization: Towards reducing minibatch dependence in batch-normalized models. Advances in neural information processing systems, 30.
  15. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, 448–456. pmlr.
  16. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. arXiv:1609.04836.
  17. CIFAR-100 (Canadian Institute For Advanced Research). Dataset available from https://www.cs.toronto.edu/ kriz/cifar.html.
  18. NoMorelization: Building Normalizer-Free Models from a Sample’s Perspective. arXiv:2210.06932.
  19. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  20. Micro-Batch Training with Batch-Channel Normalization and Weight Standardization. arXiv:1903.10520.
  21. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Advances in neural information processing systems, 29.
  22. Stochastic normalizations as bayesian learning. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part II 14, 463–479. Springer.
  23. Evalnorm: Estimating batch normalization statistics for evaluation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3633–3641.
  24. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(56): 1929–1958.
  25. Four Things Everyone Should Know to Improve Batch Normalization. arXiv:1906.03548.
  26. Patches Are All You Need? Transactions on Machine Learning Research. Featured Certification.
  27. Wang, P. 2023. vit-pytorch. https://github.com/lucidrains/vit-pytorch.
  28. The implicit and explicit regularization effects of dropout. In International conference on machine learning, 10181–10192. PMLR.
  29. Wightman, R. 2019. PyTorch Image Models. https://github.com/rwightman/pytorch-image-models.
  30. Group normalization. In Proceedings of the European conference on computer vision (ECCV), 3–19.
  31. Rethinking" batch" in batchnorm. arXiv preprint arXiv:2105.07576.
  32. Wide Residual Networks. arXiv:1605.07146.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Atli Kosson (9 papers)
  2. Dongyang Fan (8 papers)
  3. Martin Jaggi (155 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.