Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gradient-Free Training of Recurrent Neural Networks using Random Perturbations (2405.08967v3)

Published 14 May 2024 in cs.LG

Abstract: Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities, yet existing methods for their training encounter efficiency challenges. Backpropagation through time (BPTT), the prevailing method, extends the backpropagation (BP) algorithm by unrolling the RNN over time. However, this approach suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information. Furthermore, BPTT has been shown to struggle to propagate gradient information for long sequences, leading to vanishing gradients. An alternative strategy to using gradient-based methods like BPTT involves stochastically approximating gradients through perturbation-based methods. This learning approach is exceptionally simple, necessitating only forward passes in the network and a global reinforcement signal as feedback. Despite its simplicity, the random nature of its updates typically leads to inefficient optimization, limiting its effectiveness in training neural networks. In this study, we present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT, while maintaining the inherent advantages over gradient-based learning. To this end, we extend the recently introduced activity-based node perturbation (ANP) method to operate in the time domain, leading to more efficient learning and generalization. We subsequently conduct a range of experiments to validate our approach. Our results show similar performance, convergence time and scalability compared to BPTT, strongly outperforming standard node and weight perturbation methods. These findings suggest that perturbation-based learning methods offer a versatile alternative to gradient-based methods for training RNNs which can be ideally suited for neuromorphic computing applications

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Constrained parameter inference as a principle for learning. ArXiv preprint arXiv:2203.13203.
  2. Unitary evolution recurrent neural networks. In International Conference on Machine Learning, pages 1120–1128. PMLR.
  3. Assessing the scalability of biologically-motivated deep learning algorithms and architectures. Advances in Neural Information Processing Systems, 31.
  4. A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications, 11:1–15.
  5. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166.
  6. Neuromodulation of spike-timing-dependent plasticity: past, present, and future. Neuron, 103(4):563–581.
  7. Cauwenberghs, G. (1992). A fast stochastic error-descent algorithm for supervised learning and optimization. Advances in Neural Information Processing Systems, 5.
  8. Sparse synaptic connectivity is required for decorrelation and pattern separation in feedforward networks. Nature Communications, 8(1):1116.
  9. An increase of inhibition drives the developmental decorrelation of neural activity. Elife, 11:e78811.
  10. Learning phrase representations using RNN encoder-decoder for statistical machine translation. ArXiv preprint arXiv:1406.1078.
  11. Turing completeness of bounded-precision recurrent neural networks. Advances in Neural Information Processing Systems, 34:28431–28441.
  12. Efficient deep learning with decorrelated backpropagation. ArXiv preprint arXiv:2405.02385.
  13. Effective learning with node perturbation in deep neural networks. ArXiv preprint arXiv:2310.00965.
  14. Natural neural networks. Advances in Neural Information Processing Systems, 28.
  15. Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4-6):495–506.
  16. Decorrelated neuronal firing in cortical microcircuits. Science, 327(5965):584–587.
  17. Noise in the nervous system. Nature Reviews Neuroscience, 9(4):292–303.
  18. Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances. Journal of Neurophysiology, 98(4):2038–2057.
  19. Gradient learning in spiking neural networks by dynamic perturbation of conductances. Physical Review Letters, 97(4):048104.
  20. Gokmen, T. (2021). Enabling training of neural networks on noisy hardware. Frontiers in Artificial Intelligence, 4:699148.
  21. Recurrent neural networks for time series forecasting: Current status and future directions. International Journal of Forecasting, 37(1):388–427.
  22. On the stability and scalability of node perturbation learning. Advances in Neural Information Processing Systems, 35:31929–31941.
  23. Decorrelated batch normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 791–800.
  24. The rise of intelligent matter. Nature, 594(7863):345–355.
  25. Adam: A method for stochastic optimization. ArXiv preprint arXiv:1412.6980.
  26. Learning to solve the credit assignment problem. ArXiv preprint arXiv:1906.00889.
  27. Backpropagation through time and the brain. Current Opinion in Neurobiology, 55:82–89.
  28. Backpropagation and the brain. Nature Reviews Neuroscience, 21(6):335–346.
  29. Luo, P. (2017). Learning deep architectures via generalized whitened neural networks. In International Conference on Machine Learning, pages 2238–2246. PMLR.
  30. Oscillation and chaos in physiological control systems. Science, 197(4300):287–289.
  31. Marder, E. (2012). Neuromodulation of neuronal circuits: back to the future. Neuron, 76(1):1–11.
  32. Miconi, T. (2017). Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks. Elife, 6:e20899.
  33. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51–63.
  34. Resurrecting recurrent neural networks for long sequences. In International Conference on Machine Learning, pages 26670–26698. PMLR.
  35. On the difficulty of training recurrent neural networks. In International Conference on Machine Learning, pages 1310–1318. PMLR.
  36. Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1):1–27.
  37. Opportunities for neuromorphic computing algorithms and applications. Nature Computational Science, 2(1):10–19.
  38. Spall, J. C. (1992). Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control, 37(3):332–341.
  39. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 27.
  40. Deep learning in spiking neural networks. Neural Networks, 111:47–63.
  41. Decorrelation of neural-network activity by inhibitory feedback. PLoS Computational Biology, 8(8).
  42. Legendre memory units: Continuous-time representation in recurrent neural networks. Advances in Neural Information Processing Systems, 32.
  43. Supervised learning in spiking neural networks: A review of algorithms and evaluations. Neural Networks, 125:258–280.
  44. Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560.
  45. Learning curves for stochastic gradient descent in linear feedforward networks. Advances in Neural Information Processing Systems, 16.
  46. 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78(9):1415–1442.
  47. Mechanisms of pattern decorrelation by recurrent neuronal circuits. Nature Neuroscience, 13(8):1003–1010.
  48. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280.
  49. Recurrent neural networks for language understanding. In Interspeech, pages 2524–2528.
  50. Brain-inspired learning on neuromorphic substrates. ArXiv preprint arXiv:2010.11931.
  51. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11106–11115.
  52. Weight versus node perturbation learning in temporally extended tasks: Weight perturbation often performs similarly or better. Physical Review X, 13(2):021006.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jesus Garcia Fernandez (5 papers)
  2. Sander Keemink (3 papers)
  3. Marcel van Gerven (48 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com