SuperSpike: Supervised Learning in SNNs

Updated 28 March 2026

SuperSpike is a supervised learning framework for multilayer networks of LIF neurons that overcomes nondifferentiability using surrogate gradient methods.
It employs a three-factor plasticity rule combining postsynaptic voltage sensitivity, presynaptic activity, and an error signal for effective credit assignment.
The framework demonstrates precise spike-timing control and robust generalization on tasks like temporal XOR classification and high-dimensional transformations.

SuperSpike is a supervised learning framework for multilayer networks of deterministic leaky integrate-and-fire (LIF) neurons, developed to address the intrinsic challenges of credit assignment and nondifferentiability in spiking neural networks (SNNs). By introducing surrogate gradient methods, SuperSpike enables efficient end-to-end learning of nonlinear spatiotemporal transformations between spike-timing patterns. The approach centers on a three-factor plasticity rule, combining postsynaptic voltage sensitivity, presynaptic activity, and an error signal, and explores biologically plausible credit-assignment strategies, such as random and symmetric feedback alignment. SuperSpike advances both computational neuroscience and neuromorphic machine learning by connecting algorithmic results with principles of synaptic plasticity observed in vivo (Zenke et al., 2017, Neftci et al., 2019).

1. LIF Neuron Model and Spiking Dynamics

SuperSpike adopts current-based LIF neurons with discrete- or continuous-time dynamics. The model state per neuron includes the membrane potential $U_i[n]$ (or $V_i(t)$ in continuous time) and synaptic current $I_i[n]$ . In discrete time, the updates are:

Synaptic current:

$I_i[n+1] = \alpha I_i[n] + \sum_j W_{ij} S_j[n] + \sum_j V_{ij} S_j[n]$

with $\alpha=\exp(-\frac{\Delta t}{\tau_{\mathrm{syn}}})$ .

Membrane potential:

$U_i[n+1] = \beta U_i[n] + I_i[n] - S_i[n]$

with $\beta = \exp(-\frac{\Delta t}{\tau_{\mathrm{mem}}})$ .

Spiking output:

$S_i[n] = \Theta(U_i[n] - \theta),$

where $\Theta$ is the Heaviside step and $S_i[n]=1$ triggers a membrane reset. In continuous time, standard LIF dynamics and spike-response models are used, with synaptic inputs integrated via causal exponential kernels (Neftci et al., 2019, Zenke et al., 2017).

The nondifferentiability of $\Theta$ represents a central obstacle for gradient-based learning: the spike train's derivative is zero almost everywhere and undefined at the threshold.

2. Surrogate Gradient Approach

To circumvent non-differentiability, SuperSpike replaces the derivative of the spiking nonlinearity with a smooth surrogate. The most common surrogate is a "fast sigmoid":

$\sigma(U) = \frac{1}{1 + \exp(-\gamma U)}, \quad \sigma'(U) = \frac{\gamma \exp(-\gamma U)}{(1 + \exp(-\gamma U))^2}$

or, equivalently,

$\sigma(V) = \frac{1}{1+\exp[-\beta (V-\theta)]}$

with

$\frac{\partial\sigma}{\partial V} = \beta\,\sigma(V)[1-\sigma(V)].$

The parameter $\gamma$ or $\beta$ determines the narrowness of the transition region around threshold. During backpropagation, the hard threshold's derivative $\Theta'(U-\theta)$ is substituted by $\sigma'(U-\theta)$ , enabling the computation of partial derivatives through the spiking operation. This allows standard (or approximated) gradient-based optimization in SNNs (Zenke et al., 2017, Neftci et al., 2019).

3. Loss Function and Online Three-Factor Learning Rule

SuperSpike targets precise spike-timing supervision using the van Rossum distance between actual and target spike trains, smoothed by a causal kernel $\epsilon(t)$ :

Continuous time:

$L = \frac12 \int_{-\infty}^T [\epsilon * (S_k(t) - S_k^*(t))]^2 \, dt$

Discrete time:

$L \approx \frac12 \sum_n [\epsilon * (S_k[n] - S_k^*[n])]^2$

The per-timestep error signal is

$e_k[n] = (\epsilon * (S_k[n] - S_k^*[n]))$

The gradient of the loss with respect to weights, after applying surrogate derivatives and collecting terms, yields a local three-factor rule:

$\Delta W_{ij}[n] = \eta \, e_i[n] \, \mathrm{Tr}_{ij}[n]$

where the eligibility trace $\mathrm{Tr}_{ij}[n]$ evolves as

$\mathrm{Tr}_{ij}[n+1] = \alpha\,\mathrm{Tr}_{ij}[n] + \sigma'(U_i[n]-\theta) S_j[n]$

The three factors—post-synaptic voltage (via $\sigma'$ ), pre-synaptic trace ( $S_j[n]$ ), and feedback error ( $e_i[n]$ )—capture voltage-based and error-gated synaptic plasticity in a manner aligned with experimental neuromodulatory and spike-timing dependent plasticity phenomena (Zenke et al., 2017, Neftci et al., 2019).

4. Spatial Credit Assignment and Feedback Alignment

In multilayer networks, hidden units' error signals must be computed from output-layer errors for spatial credit assignment. SuperSpike considers three feedback strategies:

Symmetric feedback (exact backpropagation):

$e_i(t) = \sum_k W_{ki} e_k(t)$

Random feedback (feedback alignment):

$e_i(t) = \sum_k b_{ki} e_k(t)$

where $b_{ki}$ are fixed, randomly drawn elements.

Uniform feedback (global third factor):

$e_i(t) = \sum_k e_k(t)$

These schemes allow exploration of a continuum between strict backpropagation and biologically plausible error broadcast. On simple tasks, all strategies can suffice, but increasingly complex spatiotemporal tasks require more targeted error signals—symmetric feedback or well-aligned random feedback. Uniform feedback fails on nonlinear transformations, demonstrating that local, neuron- or synapse-specific error information is required for deep SNN learning (Zenke et al., 2017).

5. Benchmark Tasks and Empirical Results

SuperSpike has been validated on a suite of SNN tasks emphasizing precise temporal coding:

Single-neuron pattern learning: A single LIF neuron with 100 Poisson inputs was trained to emit a prescribed set of precisely timed output spikes. Convergence to the target was rapid (within a few hundred simulated seconds).
Multi-layer mappings: Shallow networks (input, hidden, output; 2–5 layers) learned mappings from spatiotemporal input patterns to desired output spike trains, including arbitrary target spike-time sequences.
Temporal XOR classification: Networks with one hidden layer produced perfectly timed spike outputs discriminating spatiotemporal input patterns with nonlinear separability.
High-dimensional transformations: With symmetric feedback and sufficiently large hidden layers (≥ 32 neurons), deep SNNs were trained to map 100-dimensional cyclic inputs to 100-output spike trains, achieving rapid convergence and biologically plausible firing statistics.

SuperSpike consistently achieves low training error and perfect (or near-perfect) generalization on these nonlinear timing tasks, outperforming non-hierarchical and uniform-feedback models on the most difficult benchmarks. In follow-up work, similar surrogate-gradient-based three-factor rules in deep convolutional SNNs achieved performance on dynamic vision sensor (DVS)-Gesture data competitive with BPTT-based SNNs, while reducing memory requirements and training steps (Neftci et al., 2019).

6. Biological and Algorithmic Significance

SuperSpike forges a conceptual link between experimentally observed forms of synaptic plasticity and algorithmic principles of credit assignment:

The three-factor rule reflects neuromodulator-gated, voltage- and spike-dependent synaptic updates observed in cortical and subcortical circuits.
Surrogate gradients operationalize the notion that biological neurons respond nonlinearly near threshold, but may exhibit smooth sensitivity profiles under certain conditions.
Feedback alignment results indicate that several degrees of biological approximation—randomized connectivity or error broadcasting—can support learning of simple tasks, but targeted error signals are necessary for deep, nonlinear mappings.

This suggests possible directions both for improved neuromorphic SNN training and for understanding credit assignment mechanisms in biological networks (Zenke et al., 2017, Neftci et al., 2019).

7. References and Implementations

Foundational descriptions and detailed implementations of SuperSpike are provided in:

F. Zenke & S. Ganguli, "SuperSpike: Supervised Learning in Multilayer Spiking Neural Networks", Neural Computation, 2018 (Zenke et al., 2017).
J. Kaiser et al., "Synaptic Plasticity for Deep Continuous Local Learning", (Madadi et al., 2018), 2018.
Tutorials and practical guidance for surrogate-gradient SNNs are detailed in "Surrogate Gradient Learning in Spiking Neural Networks" (Neftci et al., 2019).