Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stabilizing Spiking Neuron Training (2202.00282v4)

Published 1 Feb 2022 in cs.NE and cs.AI

Abstract: Stability arguments are often used to prevent learning algorithms from having ever increasing activity and weights that hinder generalization. However, stability conditions can clash with the sparsity required to augment the energy efficiency of spiking neurons. Nonetheless it can also provide solutions. In fact, spiking Neuromorphic Computing uses binary activity to improve Artificial Intelligence energy efficiency. However, its non-smoothness requires approximate gradients, known as Surrogate Gradients (SG), to close the performance gap with Deep Learning. Several SG have been proposed in the literature, but it remains unclear how to determine the best SG for a given task and network. Thus, we aim at theoretically define the best SG, through stability arguments, to reduce the need for grid search. In fact, we show that more complex tasks and networks need more careful choice of SG, even if overall the derivative of the fast sigmoid tends to outperform the other, for a wide range of learning rates. We therefore design a stability based theoretical method to choose initialization and SG shape before training on the most common spiking neuron, the Leaky Integrate and Fire (LIF). Since our stability method suggests the use of high firing rates at initialization, which is non-standard in the neuromorphic literature, we show that high initial firing rates, combined with a sparsity encouraging loss term introduced gradually, can lead to better generalization, depending on the SG shape. Our stability based theoretical solution, finds a SG and initialization that experimentally result in improved accuracy. We show how it can be used to reduce the need of extensive grid-search of dampening, sharpness and tail-fatness of the SG. We also show that our stability concepts can be extended to be applicable on different LIF variants, such as DECOLLE and fluctuations-driven initializations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Sepp Hochreiter. Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universität München, 91(1), 1991.
  2. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.
  3. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  4. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, 2001.
  5. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
  6. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In ICLR, 2014.
  7. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, pages 1026–1034, 2015.
  8. The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks. Cambridge University Press, 2022.
  9. Towards the systematic reporting of the energy and carbon footprints of machine learning. JMLR, 21(248):1–43, 2020.
  10. Benchmarking keyword spotting efficiency on neuromorphic hardware. In Proceedings of the 7th Annual Neuro-inspired Computational Elements Workshop, pages 1–8, 2019.
  11. Advancing neuromorphic computing with loihi: A survey of results and outlook. Proceedings of the IEEE, 109(5):911–934, 2021.
  12. Louis Lapique. Recherches quantitatives sur l’excitation electrique des nerfs traitee comme une polarization. Journal of Physiology and Pathololgy, 9:620–635, 1907.
  13. Eugene M Izhikevich. Simple model of spiking neurons. IEEE Transactions on neural networks, 14(6):1569–1572, 2003.
  14. Towards biologically-plausible neuron models and firing rates in high-performance deep spiking neural networks. In International Conference on Neuromorphic Systems 2021, ICONS 2021, New York, NY, USA, 2021. Association for Computing Machinery.
  15. Fluctuation-driven initialization for spiking neural network training. Neuromorphic Computing and Engineering, 2(4):044016, 2022.
  16. The remarkable robustness of surrogate gradient learning for instilling complex function in spiking neural networks. Neural Computation, 33(4):899–925, 2021.
  17. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
  18. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, pages 462–466, 1952.
  19. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  20. Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing, 48(1-4):17–37, 2002.
  21. Convolutional networks for fast, energy-efficient neuromorphic computing. Proceedings of the national academy of sciences, 113(41):11441–11446, 2016.
  22. Superspike: Supervised learning in multilayer spiking neural networks. Neural computation, 30(6):1514–1541, 2018.
  23. Long short-term memory and learning-to-learn in networks of spiking neurons. In NeurIPS, 2018.
  24. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51–63, 2019.
  25. Sander M. Bohte. Error-backpropagation in networks of fractionally predictive spiking neurons. In ICANN, pages 60–68. Springer Berlin Heidelberg, 2011.
  26. Binarized neural networks. In NeurIPS, volume 29. Curran Associates, Inc., 2016.
  27. Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks. Nature Machine Intelligence, 3(10):905–913, 2021.
  28. On the initialization of long short-term memory networks. In ICONIP, pages 275–286. Springer, 2019.
  29. Unitary evolution recurrent neural networks. In International conference on machine learning, pages 1120–1128. PMLR, 2016.
  30. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318. PMLR, 2013.
  31. Binarized neural networks. NeurIPS, 29, 2016.
  32. S. Shrestha and G. Orchard. Slayer: Spike layer error reassignment in time. In NeurIPS, 2018.
  33. Effective and efficient computation with multiple-timescale spiking recurrent neural networks. In ICONS, ICONS 2020, New York, NY, USA, 2020. Association for Computing Machinery.
  34. Richard P Brent. Fast multiple-precision evaluation of elementary functions. Journal of the ACM (JACM), 23(2):242–251, 1976.
  35. Timm Ahrendt. Fast computations of the exponential function. In Christoph Meinel and Sophie Tison, editors, STACS 99, pages 302–312, Berlin, Heidelberg, 1999. Springer Berlin Heidelberg.
  36. Neuronal dynamics: From single neurons to networks and models of cognition. Cambridge University Press, 2014.
  37. Deep learning incorporating biologically inspired neural dynamics and in-memory computing. Nature Machine Intelligence, 2(6):325–336, 2020.
  38. How to start training: The effect of initialization and architecture. In NeurIPS, 2018.
  39. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
  40. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  41. The mean, median, and mode of unimodal distributions: a characterization. Theory of Probability & Its Applications, 41(2):210–223, 1997.
  42. Synaptic plasticity dynamics for deep continuous local learning (decolle). Frontiers in Neuroscience, 14, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.