Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Effective Learning with Node Perturbation in Multi-Layer Neural Networks (2310.00965v4)

Published 2 Oct 2023 in cs.LG

Abstract: Backpropagation (BP) remains the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP) proposes learning by the injection of noise into network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and unstable due to its unguided noise-based search process. In this work, we investigate different formulations of NP and relate it to the concept of directional derivatives as well as combining it with a decorrelating mechanism for layer-wise inputs. We find that a closer alignment with directional derivatives together with input decorrelation at every layer strongly enhances performance of NP learning with large improvements in parameter convergence and much higher performance on the test data, approaching that of BP. Furthermore, our novel formulation allows for application to noisy systems in which the noise process itself is inaccessible.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Constrained parameter inference as a principle for learning. Transactions on Machine Learning Research, 2023.
  2. Gradients without backpropagation. arXiv preprint arXiv:2202.08587, 2022.
  3. Békésy, G. Mach band type lateral inhibition in different sense organs. The Journal of General Physiology, 50:519–32, 02 1967.
  4. Cauwenberghs, G. A fast stochastic error-descent algorithm for supervised learning and optimization. Advances in Neural Information Processing Systems, 5, 1992.
  5. Node perturbation learning without noiseless baseline. Neural Networks, 24(3):267–272, April 2011.
  6. Crick, F. The recent excitement about neural networks. Nature, 337:129–32, 02 1989.
  7. Model-free distributed learning. IEEE Transactions on Neural Networks, 1(1):58–70, 1990.
  8. Noise in the nervous system. Nature Reviews. Neuroscience, 9:292–303, 05 2008.
  9. Gradient learning in spiking neural networks by dynamic perturbation of conductances. Physical Review Letters, 97(4):048104, 2006.
  10. Gokmen, T. Enabling training of neural networks on noisy hardware. Frontiers in Artificial Intelligence, 4:699148, 9 2021. ISSN 26248212. doi: 10.3389/FRAI.2021.699148/BIBTEX.
  11. Grossberg, S. Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11:23–63, 1987.
  12. On the stability and scalability of node perturbation learning. In Advances in Neural Information Processing Systems, 2022.
  13. The rise of intelligent matter. Nature, 594:345–355, 6 2021.
  14. Inhibitory interneurons decorrelate excitatory cells to drive sparse code formation in a spiking model of V1. Journal of Neuroscience, 33(13):5475–5485, March 2013.
  15. Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  16. Efficient backprop. In Neural Networks: Tricks of the Trade, pp.  9–50. Springer, 2002.
  17. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7(1):1–10, 2016.
  18. Linnainmaa, S. The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master’s Thesis (in Finnish), Univ. Helsinki, pp.  6–7, 1970.
  19. Luo, P. Learning deep architectures via generalized whitened neural networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of Machine Learning Research, volume 70, pp.  2238–2246, 06–11 Aug 2017.
  20. Scaling forward gradient with local losses. arXiv preprint arXiv:2210.03310, 2022.
  21. Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional space. In Proc. of Seventeenth International Conference on Machine Learning, pp.  1079–1086, 2000.
  22. Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization. In International Conference on Machine Learning, pp. 10617–10629. PMLR, 2021.
  23. Learning curves for stochastic gradient descent in linear feedforward networks. Advances in Neural Information Processing Systems, 16, 2003.
  24. Brain-inspired learning on neuromorphic substrates. Proceedings of the IEEE, pp.  1–16, 2021. doi: 10.1109/jproc.2020.3045625.
  25. Weight perturbation learning outperforms node perturbation on broad classes of temporally extended tasks. BioRxiv, 10 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sander Dalm (5 papers)
  2. Marcel van Gerven (48 papers)
  3. Nasir Ahmad (27 papers)

Summary

We haven't generated a summary for this paper yet.