Elucidating the theoretical underpinnings of surrogate gradient learning in spiking neural networks (2404.14964v3)

Published 23 Apr 2024 in cs.NE and q-bio.NC

Abstract: Training spiking neural networks to approximate universal functions is essential for studying information processing in the brain and for neuromorphic computing. Yet the binary nature of spikes poses a challenge for direct gradient-based training. Surrogate gradients have been empirically successful in circumventing this problem, but their theoretical foundation remains elusive. Here, we investigate the relation of surrogate gradients to two theoretically well-founded approaches. On the one hand, we consider smoothed probabilistic models, which, due to the lack of support for automatic differentiation, are impractical for training multi-layer spiking neural networks but provide derivatives equivalent to surrogate gradients for single neurons. On the other hand, we investigate stochastic automatic differentiation, which is compatible with discrete randomness but has not yet been used to train spiking neural networks. We find that the latter gives surrogate gradients a theoretical basis in stochastic spiking neural networks, where the surrogate derivative matches the derivative of the neuronal escape noise function. This finding supports the effectiveness of surrogate gradients in practice and suggests their suitability for stochastic spiking neural networks. However, surrogate gradients are generally not gradients of a surrogate loss despite their relation to stochastic automatic differentiation. Nevertheless, we empirically confirm the effectiveness of surrogate gradients in stochastic multi-layer spiking neural networks and discuss their relation to deterministic networks as a special case. Our work gives theoretical support to surrogate gradients and the choice of a suitable surrogate derivative in stochastic spiking neural networks.

Citations (2)

View on Semantic Scholar

Summary

The paper demonstrates that surrogate gradients in single-layer perceptrons can mirror expected output gradients when synchronizing with matching noise functions.
It reveals that traditional SPMs struggle in multi-layer networks, while stochAD streamlines gradient computation for complex SNN architectures.
The study extends its findings to biophysically plausible LIF neuron models, validating the approach under both deterministic and stochastic training conditions.

Surrogate Gradients in Stochastic Spiking Neural Networks: Theoretical Insights and Practical Implications

Introduction

Spiking Neural Networks (SNNs) are crucial for understanding brain dynamics and developing neuromorphic computing systems. Unlike traditional Artificial Neural Networks (ANNs), SNNs process information through discrete spike events, posing challenges for employing conventional gradient-based learning methods due to the non-differentiable nature of spike generation. Surrogate Gradient (SG) methods are commonly deployed to train SNNs despite lacking a rigorous theoretical foundation. This article investigates SG methods by comparing them with Smoothed Probabilistic Models (SPMs) and a novel approach, Stochastic Automatic Differentiation (stochAD), to establish a theoretically grounded framework for training stochastic SNNs effectively.

Methodological Analysis

The Nexus between Surrogate Gradients, SPMs, and stochAD

SG approaches, which have demonstrated empirical success, substitute the non-differentiable spike function with a differentiable proxy during backpropagation. While SPMs leverage stochasticity to smooth optimization landscapes, facilitating gradient computation, their extension to deep SNN architectures is limited due to incompatibility with automatic differentiation (AD). On the other hand, stochAD, which accommodates discrete randomness in differentiation, promises broader applicability in training layered SNNs though requiring further exploration.

Single and Multi-layer Perceptron Models

Starting with binary perceptrons, the analysis shows that surrogate gradients in single-layer perceptrons can equate to the gradients of the expected output under SPMs, contingent on matching surrogate and escape noise functions. When escalated to multi-layer perceptrons (MLPs), traditional SPMs struggle due to the inability of the expectation-smoothing approach to decompose gradient calculations effectively across layers, hindering the use of backpropagation. In contrast, stochAD offers a mechanism for engaging smoothed stochastic derivatives, streamlining gradient computation even in multi-layer settings.

Extendibility to Leaky Integrate-and-Fire Neurons

Further examination reveals that the theoretical underpinnings developed for perceptrons are transferable to more biophysically plausible models like Leaky Integrate-and-Fire (LIF) neurons. The translation is straightforward due to similarities in their binarized output behaviors despite the incorporation of temporal dynamics in LIF models.

Empirical Verification through Simulations

Training Under Stochastic and Deterministic Conditions

Computational experiments validate that both deterministic and stochastic SNNs trained via SG approaches can undertake complex tasks such as pattern recognition and time-series prediction with comparable proficiency. This viability extends to SNNs characterized by multiple noisy processing layers, potentially simulating real-world applications where input signals or processing elements are inherently stochastic.

Noteworthy Implications for Theoretical and Practical Fields

The successful training of stochastic SNNs underscores the potential of SG methods, grounded in stochAD framework, for advancing neuromorphic technologies. These findings could influence the design principles of next-generation computational models that closely emulate the statistical nature of neuronal operations in the brain, facilitating deeper insights into neural computation and learning.

Conclusion and Future Directions

This paper furnishes a much-needed theoretical foundation for SG methods in the training of stochastic SNNs, with stochAD providing a robust framework that supports effective learning across varied network architectures. Future research could explore optimizing the performance enhancements and computational efficiency of SG-based learning strategies, potentially incorporating insights from this foundational understanding to tackle more complex and noise-affine computational tasks in diverse scientific and technological arenas.

PDF Markdown

Related Papers

Tweets

https://twitter.com/hisspikeness/status/1785184635799920987