First-to-Spike Decoding in SNNs

Updated 13 May 2026

First-to-spike decoding is a temporal neural coding paradigm in spiking neural networks that determines outputs based on the earliest spike emitted, emphasizing sparse and low-latency computation.
It employs techniques like softmax-cross-entropy on exponentially decayed spike scores and surrogate gradient methods to overcome non-differentiability in spike events.
This strategy is pivotal for neuromorphic systems and energy-constrained AI, demonstrating significant reductions in latency and energy consumption compared to traditional rate coding.

First-to-spike decoding is a temporal neural readout strategy for spiking neural networks (SNNs) in which information is encoded and decoded solely through the precise timing of the first spike emitted by each output neuron. Instead of aggregating over spike counts or rates, classification or control decisions are determined as soon as the earliest spike occurs at the output layer, yielding inherently sparse, low-latency, and energy-efficient computation. First-to-spike decoding underpins time-to-first-spike (TTFS) and related latency coding paradigms, now used extensively in neuromorphic machine learning, event-based vision, and energy-constrained AI systems.

1. Formal Definition and Decoding Rule

Let $N$ denote the number of output neurons (typically equal to the number of classes or available actions), and $T$ the maximum number of discrete time steps considered ( $t = 0,1,\dots,T-1$ ). Each output neuron $i$ emits a spike train $S_i[t] \in \{0,1\}$ , with $S_i[t]=1$ if neuron $i$ fires at time $t$ (otherwise 0). Under strict TTFS coding, each output neuron fires at most once during $t=0\dots T-1$ , enforced by either hard refractory periods or architectural constraints (Che et al., 2024).

The first spike time for neuron $i$ is

$T$ 0

If no spike occurs by $T$ 1, a default assignment at $T$ 2 is used. The output decision is then

$T$ 3

i.e., the earliest firing output neuron determines the predicted class (or chosen action) (Che et al., 2024, Göltz et al., 2019, Annamalai et al., 2024, Gardner et al., 2020).

To facilitate gradient-based training, first-spike times are often mapped onto scores using an exponentially decaying kernel:

$T$ 4

with decay factor $T$ 5 (commonly $T$ 6). The most probable output is then $T$ 7, which is equivalent to the earliest spike rule (Che et al., 2024).

2. Learning Objectives and Gradient Formulations

Training SNNs with first-to-spike decoding poses challenges due to the non-differentiable nature of spike events and the severe information sparsity. Several approaches have been proposed:

Softmax-cross-entropy on temporal scores:

The output scores $T$ 8 are used with a softmax, and the cross-entropy is minimized:

$T$ 9

$t = 0,1,\dots,T-1$ 0

where $t = 0,1,\dots,T-1$ 1 is the one-hot target and $t = 0,1,\dots,T-1$ 2 denotes the weight parameters. This loss encourages early firing for the target output neuron and delayed (or absent) spikes for others, leveraging exponential temporal weighting (Che et al., 2024, Göltz et al., 2019).

First-spike margin or log-sum-exp loss:

To explicitly enforce that the correct output spikes first, a smooth surrogate margin loss can be employed:

$t = 0,1,\dots,T-1$ 3

where $t = 0,1,\dots,T-1$ 4 is the target neuron and $t = 0,1,\dots,T-1$ 5 controls the margin softness. This pushes the correct output spike time earlier relative to others (Göltz et al., 2019, Gardner et al., 2020).

Negative-time softmax loss:

An equivalent "negative-time" softmax is applied to spike times for maximum likelihood estimation:

$t = 0,1,\dots,T-1$ 6

This form is used in direct-gradient frameworks where $t = 0,1,\dots,T-1$ 7 is fully differentiable with respect to network weights (Annamalai et al., 2024, Lu et al., 24 Mar 2026).

3. Implementation in Neuron and Network Models

First-to-spike decoding is compatible with a range of neuron and network models:

Leaky and Non-leaky Integrate-and-Fire (LIF/IF) neurons: Closed-form solutions for the time of threshold crossing are available, especially under single-spike constraints. Non-leaky models further simplify analytical gradients for TTFS training (Annamalai et al., 2024, Che et al., 2024, Park et al., 2020).
Probabilistic LIF or GLM neurons: For stochastic neurons, spike generation is Bernoulli-distributed in each time step given the membrane potential. First-to-spike decoding is then defined in terms of the probability law that the correct neuron fires first and no others have yet fired (Jiang et al., 2024, Bagheri et al., 2017, Rosenfeld et al., 2018).
Multi-layer and deep architectures: TTFS coding has been used in feedforward, convolutional, and VGG-like deep SNNs, often with architectural or kernel-based adaptations to ease training and propagation of single spikes (Park et al., 2020, Lu et al., 24 Mar 2026).

A common training sequence includes: simulating for $t = 0,1,\dots,T-1$ 8 timesteps, extracting $t = 0,1,\dots,T-1$ 9, computing the loss, obtaining gradients (sometimes via BPTT and surrogate gradients), and parameter update (e.g., Adam) (Che et al., 2024, Göltz et al., 2019, Annamalai et al., 2024).

4. Advantages and Trade-offs Relative to Rate Coding

First-to-spike decoding exhibits a number of empirically and theoretically validated advantages over conventional rate-based spike decoding:

Property	First-to-spike	Rate-based
Spiking sparsity	Each neuron max 1 spike	Many spikes per neuron
Latency	Early decision (as soon as any output fires)	Must wait full window
Energy efficiency	Fewer total spikes, fewer updates	Higher dynamic power and memory access
Accuracy	Comparable; can match or exceed rate code with proper loss/training	Matches ANNs with sufficient time
Gradient stability	Can be unstable (single spike constraint)	More graded, but slower inference

First-to-spike coding dramatically reduces both latency and overall spike-count. For example, in SNNs for MNIST and DVS gesture, inference with TTFS can be performed with as little as $i$ 0– $i$ 1 average timesteps, with energy use $i$ 2– $i$ 3 lower than rate code, at equivalent or superior accuracy (Che et al., 2024, Jiang et al., 2024, Lu et al., 24 Mar 2026).

Trade-offs include:

Increased sensitivity to hardware device variation (since one spike can dominate the outcome) (Oh et al., 2020).
Potential training instabilities, addressed via normalization and architecture design (e.g., average pooling preserves single-spike regime, max-pooling may violate it) (Che et al., 2024).
Greater sensitivity to input noise, though stochastic variants and adaptive temporal supervision can mitigate this (Jiang et al., 2024, Lu et al., 24 Mar 2026).

5. Training Algorithms and Optimization Techniques

Training procedures for first-to-spike decoding depend on the neuron model:

BPTT with surrogate gradients: For networks with nondifferentiable spike cross-threshold operations, continuous surrogates (e.g., straight-through estimators or piecewise linear approximations of the Heaviside) are placed so that $i$ 4 in the backward pass (Che et al., 2024, Lu et al., 24 Mar 2026).
Closed-form gradient equations: Where spike times are differentiable with respect to membrane parameters (e.g., in non-leaky IF neurons), explicit analytic gradients are derived for the loss with respect to both spike times and weights, allowing direct application of SGD or Adam (Annamalai et al., 2024, Göltz et al., 2019, Park et al., 2020).
Maximum-likelihood and policy-gradient methods: In probabilistic first-to-spike SNNs (notably with GLM neuron parameterizations) a log-likelihood is defined for the event "the correct neuron spikes first," with gradients derived from the exact distributions over spike time and output combinations. For RL, REINFORCE-style updates use the log-probability of first spike as a policy (Bagheri et al., 2017, Jiang et al., 2024, Rosenfeld et al., 2018).
Sample-adaptive and entropy-weighted losses: Recent methods weight the temporal loss mask according to per-sample confidence, using normalized entropy over outputs to encourage rapid, confident predictions but allow longer integration for uncertain cases (Lu et al., 24 Mar 2026).

6. Hardware and Neuromorphic Implementation

First-to-spike decoding is highly amenable to neuromorphic accelerators:

Event-driven computation: As decisions can be made as soon as the first output spike emerges, the hardware may halt simulation, update, and memory access for the trial, reducing active time and thermal overhead (P et al., 2020, Oh et al., 2020, Annamalai et al., 2024).
Architectural support: TTFS-compatible SNN chips (e.g., 2D mesh NoC architectures with PE-local winner-take-all logic) can process spikes asynchronously, with inference power consumption as low as 0.734 mW and per-frame energy 32.98 μJ, outperforming rate-code chips of comparable accuracy (P et al., 2020).
Analog implementations: Nonvolatile floating-gate synapses with precise conductance tuning can realize TTFS networks, though such systems demand tight control over device non-uniformities to maintain minimal spike timing errors (Oh et al., 2020).
Implications for edge and embedded AI: The sparsity of the first-to-spike regime enables ultralow-power deployment in edge and mobile platforms, and opens new avenues for deployment in real-time robotics, event-based vision, and sensor fusion (Annamalai et al., 2024, Rosenfeld et al., 2018).

7. Empirical Results, Benchmarks, and Future Directions

State-of-the-art TTFS-based SNNs now match or approach leading ANN and rate-based SNN benchmarks in accuracy, while operating with orders of magnitude lower latency and energy budgets. Tabled results from recent work include:

Model/dataset	Accuracy	Latency (steps)	Energy (% of ANN)	Reference
MNIST, SNN (S-F-BPTT)	98.62%	2.03	0.11×	(Jiang et al., 2024)
CIFAR-10, VGG-16 TTFS	93.12%	1.13	0.23 sparsity	(Lu et al., 24 Mar 2026)
CIFAR-100, TTFS-SNN	68.79%	22% burst code	0.34% #spikes	(Park et al., 2020)
MNIST, hardware TTFS	96.90%	8	3.5× lower power	(Oh et al., 2020)

Recent advancements address training instability (initialization and normalization (Che et al., 2024)), multi-spike relaxation for deeper networks (Lu et al., 24 Mar 2026), entropy-weighted supervision for robustness, and hardware-efficient analog/digital TTFS accelerators (P et al., 2020). Future directions include robustness to adversarial and hardware variation, generalization to recurrent and convolutional TTFS-SNNs, and hybrid schemes combining rate and latency information for improved resilience under noisy or out-of-distribution inputs.