Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks (2403.13833v1)

Published 8 Mar 2024 in cs.NE, cs.LG, and stat.ML

Abstract: In this paper, we first identify activation shift, a simple but remarkable phenomenon in a neural network in which the preactivation value of a neuron has non-zero mean that depends on the angle between the weight vector of the neuron and the mean of the activation vector in the previous layer. We then propose linearly constrained weights (LCW) to reduce the activation shift in both fully connected and convolutional layers. The impact of reducing the activation shift in a neural network is studied from the perspective of how the variance of variables in the network changes through layer operations in both forward and backward chains. We also discuss its relationship to the vanishing gradient problem. Experimental results show that LCW enables a deep feedforward network with sigmoid activation functions to be trained efficiently by resolving the vanishing gradient problem. Moreover, combined with batch normalization, LCW improves generalization performance of both feedforward and convolutional networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Layer normalization. In NIPS 2016 Deep Learning Symposium, 2016.
  2. Knowledge matters: Importance of prior information for optimization. Journal of Machine Learning Research, 17(8):1–32, 2016.
  3. The power of depth for feedforward neural networks. In Annual Conference on Learning Theory, volume 49, pages 907–940, 2016.
  4. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, pages 249–256, 2010.
  5. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680. Curran Associates, Inc., 2014.
  6. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016a.
  7. Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645, 2016b.
  8. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
  9. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989.
  10. Centered weight normalization in accelerating training of deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2803–2811, 2017.
  11. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456, 2015.
  12. Capabilities of three-layered perceptrons. In IEEE International Conference on Neural Networks, volume 1, pages 641–648, 1988.
  13. What is the best multi-stage architecture for object recognition? In IEEE International Conference on Computer Vision, pages 2146–2153, 2009.
  14. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  15. Linear and Nonlinear Programming. Springer, 2015.
  16. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.
  17. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning, pages 807–814, 2010.
  18. Automatic differentiation in pytorch. In NIPS 2017 Workshop Autodiff, 2017.
  19. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in Neural Information Processing Systems, pages 901–909. Curran Associates, Inc., 2016.
  20. How does batch normalization help optimization? In Advances in Neural Information Processing Systems, pages 2488–2498. Curran Associates, Inc., 2018.
  21. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
  22. Matus Telgarsky. Benefits of depth in neural networks. In Annual Conference on Learning Theory, volume 49, pages 1517–1539, 2016.
  23. Spectral norm regularization for improving the generalizability of deep learning. CoRR, abs/1705.10941, 2017.
Citations (1)

Summary

We haven't generated a summary for this paper yet.