Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Context Normalization Layer with Applications (2303.07651v2)

Published 14 Mar 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Normalization is a pre-processing step that converts the data into a more usable representation. As part of the deep neural networks (DNNs), the batch normalization (BN) technique uses normalization to address the problem of internal covariate shift. It can be packaged as general modules, which have been extensively integrated into various DNNs, to stabilize and accelerate training, presumably leading to improved generalization. However, the effect of BN is dependent on the mini-batch size and it does not take into account any groups or clusters that may exist in the dataset when estimating population statistics. This study proposes a new normalization technique, called context normalization, for image data. This approach adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance by adapting the data values to the context of the target task. The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. A. Kessy, A. Lewin, and K. Strimmer, “Optimal whitening and decorrelation,” The American Statistician, vol. 72, no. 4, pp. 309–314, 2018.
  2. Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,” in Neural networks: Tricks of the trade.   Springer, 2002, pp. 9–50.
  3. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034.
  4. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning.   PMLR, 2015, pp. 448–456.
  5. L. Huang, J. Qin, Y. Zhou, F. Zhu, L. Liu, and L. Shao, “Normalization techniques in training dnns: Methodology, analysis and application,” arXiv preprint arXiv:2009.12836, 2020.
  6. M. M. Kalayeh and M. Shah, “Training faster by separating modes of variation in batch-normalized models,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 6, pp. 1483–1500, 2019.
  7. D. M. Titterington and A. RM, “A. f. smith, and ue makov, 1985 statistical analysis offinite mixture distributions.”
  8. V. Dumoulin, J. Shlens, and M. Kudlur, “A learned representation for artistic style,” arXiv preprint arXiv:1610.07629, 2016.
  9. X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 172–189.
  10. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, vol. 39, no. 1, pp. 1–38, 1977.
  11. H. Noh, T. You, J. Mun, and B. Han, “Regularizing deep neural networks by noise: Its interpretation and optimization,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  12. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  13. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  14. A. Krizhevsky, V. Nair, and G. Hinton, “CIFAR-10 (canadian institute for advanced research),” 2009. [Online]. Available: http://www.cs.toronto.edu/ kriz/cifar.html
  15. ——, “CIFAR-100 (canadian institute for advanced research),” 2009. [Online]. Available: http://www.cs.toronto.edu/ kriz/cifar.html
  16. Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/
  17. Y. Lou, Y. Bai, J. Liu, S. Wang, and L.-Y. Duan, “VERI-wild: A large dataset and a new method for vehicle re-identification in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3235–3243.
  18. P. Sermanet, S. Chintala, and Y. LeCun, “Convolutional neural networks applied to house numbers digit classification,” in Proceedings of the 21st international conference on pattern recognition (ICPR2012).   IEEE, 2012, pp. 3288–3291.
  19. T. Tieleman and G. Hinton, “Divide the gradient by a running average of its recent magnitude. coursera: Neural networks for machine learning,” Technical report, 2017.
  20. D. Arthur and S. Vassilvitskii, ““k-means++: The advantages of careful seeding,” in proceedings of the eighteenth annual acm-siam symposium on discrete algorithms, ser. soda’07. philadelphia, pa, usa: Society for industrial and applied mathematics,” 2007.
  21. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  22. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  23. L. Perez and J. Wang, “The effectiveness of data augmentation in image classification using deep learning,” arXiv preprint arXiv:1712.04621, 2017.
  24. Y. Xu, A. Noy, M. Lin, Q. Qian, H. Li, and R. Jin, “Wemix: How to better utilize data augmentation,” arXiv preprint arXiv:2010.01267, 2020.
  25. L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of research on machine learning applications and trends: algorithms, methods, and techniques.   IGI global, 2010, pp. 242–264.
  26. G. Haro, M. Bertalmío, and V. Caselles, “Visual acuity in day for night,” International Journal of Computer Vision, vol. 69, pp. 109–117, 2006.
  27. R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2.   IEEE, 2006, pp. 1735–1742.
  28. A. Farahani, S. Voghoei, K. Rasheed, and H. R. Arabnia, “A brief review of domain adaptation,” Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020, pp. 877–894, 2021.
  29. D. Berthelot, R. Roelofs, K. Sohn, N. Carlini, and A. Kurakin, “Adamatch: A unified approach to semi-supervised learning and domain adaptation,” arXiv preprint arXiv:2106.04732, 2021.
  30. S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Bilal Faye (10 papers)
  2. Hanane Azzag (18 papers)
  3. Mustapha Lebbah (30 papers)
  4. Djamel Bouchaffra (6 papers)
  5. Mohamed-djallel Dilmi (2 papers)

Summary

We haven't generated a summary for this paper yet.