Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NoisyNN: Exploring the Impact of Information Entropy Change in Learning Systems (2309.10625v4)

Published 19 Sep 2023 in cs.AI and cs.CV

Abstract: We investigate the impact of entropy change in deep learning systems by noise injection at different levels, including the embedding space and the image. The series of models that employ our methodology are collectively known as Noisy Neural Networks (NoisyNN), with examples such as NoisyViT and NoisyCNN. Noise is conventionally viewed as a harmful perturbation in various deep learning architectures, such as convolutional neural networks (CNNs) and vision transformers (ViTs), as well as different learning tasks like image classification and transfer learning. However, this work shows noise can be an effective way to change the entropy of the learning system. We demonstrate that specific noise can boost the performance of various deep models under certain conditions. We theoretically prove the enhancement gained from positive noise by reducing the task complexity defined by information entropy and experimentally show the significant performance gain in large image datasets, such as the ImageNet. Herein, we use the information entropy to define the complexity of the task. We categorize the noise into two types, positive noise (PN) and harmful noise (HN), based on whether the noise can help reduce the task complexity. Extensive experiments of CNNs and ViTs have shown performance improvements by proactively injecting positive noise, where we achieved an unprecedented top 1 accuracy of 95$\%$ on ImageNet. Both theoretical analysis and empirical evidence have confirmed that the presence of positive noise, can benefit the learning process, while the traditionally perceived harmful noise indeed impairs deep learning models. The different roles of noise offer new explanations for deep models on specific tasks and provide a new paradigm for improving model performance. Moreover, it reminds us that we can influence the performance of learning systems via information entropy change.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Lossy compression of noisy images. IEEE Transactions on Image Processing, 7(12):1641–1652, 1998.
  2. Evolutionary multiobjective image feature extraction in the presence of noise. IEEE Transactions on Cybernetics, 45(9):1757–1768, 2014.
  3. BEiT: BERT pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
  4. The mechanism of stochastic resonance. Journal of Physics A: mathematical and general, 14(11):L453, 1981.
  5. An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2):211–243, 1964.
  6. Towards efficient models for real-time deep noise suppression. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 656–660, 2021.
  7. Salt-and-pepper noise removal by median-type noise detectors and detail-preserving regularization. IEEE Transactions on image processing, 14(10):1479–1485, 2005.
  8. Thomas M. Cover. Elements of information theory. John Wiley & Sons, 1999.
  9. Towards discriminability and diversity: Batch nuclear-norm maximization under label insufficient situations. CVPR, pages 3941–3950, 2020.
  10. Convit: Improving vision transformers with soft convolutional inductive biases. arXiv preprint arXiv:2103.10697, 2021.
  11. Scaling vision transformers to 22 billion parameters. arXiv preprint arXiv:2302.05442 (2023), 2023.
  12. Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition, pages 248–255, 2009.
  13. Davit: Dual attention vision transformers. In In Computer Vision–ECCV 2022: 17th European Conference, pages 74–92, 2022.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. In arXiv preprint arXiv:2010.11929, 2020.
  15. Cross-domain gradient discrepancy minimization for unsupervised domain adaptation. CVPR, pages 3937–3946, 2021.
  16. Log-transformation and its implications for data analysis. Shanghai archives of psychiatry, 26(2):105, 2014.
  17. Unsupervised domain adaptation by backpropagation. ICML, pages 1180–1189, 2015.
  18. Digital image processing. Addison-Wesley Longman Publishing Co., Inc., 1987.
  19. Semi-supervised learning by entropy minimization. NIPS, pages 211–252, 2004.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  21. Matrix analysis. Cambridge university press, 2012.
  22. Continuous univariate distributions, volume 2. John wiley & sons, 1995.
  23. Polynomial algorithms for computing the smith and hermite normal forms of an integer matrix. siam Journal on Computing, 8(4):499–507, 1979.
  24. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
  25. Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N 7, (7), 2015.
  26. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10), 1995.
  27. Domain conditioned adaptation network. AAAI, pages 11386–11393, 2020.
  28. Xuelong Li. Positive-incentive noise. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  29. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. ICML, pages 6028–6039, 2020.
  30. Domain adaptation with auxiliary target domain-oriented classifier. CVPR, pages 16632–16642, 2021.
  31. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
  32. Conditional adversarial domain adaptation. In Advances in neural information processing systems, pages 1645–1655, 2018.
  33. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  34. Marvin Marcus. Determinants of sums. The College Mathematics Journal, 2:130–135, 1990.
  35. Peter McClintock. Can noise actually boost brain power? Physics World, 15(7), 2002.
  36. Alexander McFarlane Mood. Introduction to the Theory of Statistics. 1950.
  37. Noise-induced entrainment and stochastic resonance in human brain waves. Physical review letters, 88(21), 2002.
  38. Noise reduction in gravitational-wave data via deep learning. Physical Review Research, 2(3):033066, 2020.
  39. A comprehensive review on various types of noise in image processing. int. J. Sci. eng. res, 10(10):388–393, 2019.
  40. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
  41. Visda: The visual domain adaptation challenge. arXiv preprint arXiv:1710.06924, 2017.
  42. Multi-layer random perturbation training for improving model generalization efficiently. Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2021.
  43. Crackling noise. IEEE Transactions on Signal Processing, 68:3590–3602, 2020.
  44. Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
  45. Fabrizio Russo. A method for estimation and filtering of gaussian noise in images. IEEE Transactions on Instrumentation and Measurement, 52(4):1148–1154, 2003.
  46. Crackling noise. Nature, 410(6825):242–250, 2001.
  47. Understanding machine learning: From theory to algorithms. Cambridge university press, Cambridge, 2014.
  48. Claude Elwood Shannon. A mathematical theory of communication. ACM SIGMOBILE mobile computing and communications review, 5(1):3–55, 2001.
  49. Adjustment of an inverse matrix corresponding to changes in the elements of a given column or a given row of the original matrix. Annals of Mathematical Statistics, 20, 1949.
  50. Thomas S Shores. Applied linear algebra and matrix analysis. Springer, New York, 2007.
  51. Quantification and improvement of the signal-to-noise ratio in a magnetic resonance image acquisition procedure. Magnetic resonance imaging, 14(10):1157–1163, 1996.
  52. How to train your vit? data, augmentation, and regularization in vision transformers. In arXiv preprint arXiv:2106.10270, 2021.
  53. Deep coral: Correlation alignment for deep domain adaptation. ECCV, pages 443–450, 2016.
  54. Revisiting unreasonable effectiveness of data in deep learning era. In In Proceedings of the IEEE international conference on computer vision, pages 843–852, 2017.
  55. Safe self-refinement for transformer-based domain adaptation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7191–7200, 2022.
  56. Combating label noise in deep learning using abstention. In arXiv preprint arXiv:1905.10964, 2019.
  57. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357, 2021.
  58. Maxvit: Multi-axis vision transformer. In In Computer Vision–ECCV 2022: 17th European Conference, pages 459–479, 2022.
  59. Attention is all you need. In Advances in neural information processing systems, 2017.
  60. Deep hashing network for unsupervised domain adaptation. CVPR, pages 5018–5027, 2017.
  61. Transfer learning via learning to transfer. ICML, pages 5085–5094, 2018.
  62. M. A. Woodbury. Inverting modified matrices. Statistical Research Group, Memorandum Report 42, 1950.
  63. Larger norm more transferable: An adaptive feature norm approach for unsupervised domain adaptation. ICCV, pages 1426–1435, 2019.
  64. Cdtrans: Cross-domain transformer for unsupervised domain adaptation. ICLR, pages 520–530, 2022.
  65. Tvt: Transferable vision transformer for unsupervised domain adaptation. WACV, pages 520–530, 2023.
  66. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12104–12113, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiaowei Yu (36 papers)
  2. Yao Xue (4 papers)
  3. Tianming Liu (161 papers)
  4. Dajiang Zhu (68 papers)
  5. Zhe Huang (57 papers)
  6. Minheng Chen (13 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com