Binary Spiking Online Optimization

Updated 23 November 2025

Binary Spiking Online (BSO) is an online training approach that directly updates binary weights, eliminating the latent weight storage bottleneck.
Its novel flip-signal mechanism and momentum tracking enable sign-based updates with provable sublinear regret, ensuring efficient performance.
Temporal-aware BSO (T-BSO) adapts flip thresholds to spiking dynamics, achieving competitive accuracy and memory scaling across vision, neuromorphic, and audio tasks.

The Binary Spiking Online (BSO) optimization algorithm is an online training method for Binary Spiking Neural Networks (BSNNs) designed to minimize the memory overhead inherent to conventional training methods. BSO eliminates the need for latent real-valued weight storage by directly updating binary weights through a novel flip-signal mechanism, resulting in efficient memory utilization suitable for resource-constrained systems. An extension, Temporal-aware BSO (T-BSO), adapts these flip conditions to the temporal dynamics of spiking models, further enhancing performance. Both algorithms feature provable convergence guarantees (sublinear regret bounds) and exhibit competitive or state-of-the-art accuracy and resource scaling across vision, neuromorphic, and audio benchmarks (Liang et al., 16 Nov 2025).

1. Online Training Paradigm and the Latent Weight Bottleneck

Traditional BSNN training algorithms, such as Backpropagation Through Time (BPTT), maintain dual weight representations. Binary weights $W^b \in \{-1,+1\}$ are used for inference, while latent real-valued weights $w \in \mathbb{R}$ accumulate gradients via methods like SGD or momentum. This incurs a substantial memory cost: storing latent weights scales as $\mathcal{O}(n^2 L)$ for $L$ layers of size $n \times n$ , and temporal backpropagation through $T$ steps requires $\mathcal{O}(n^2 L T)$ or more.

Online training frameworks (e.g., OTTT, NDOT) rewrite the BPTT gradient as a sum of spatial and temporal terms,

$\frac{\partial \mathcal{L}}{\partial W} = \sum_{t=1}^T \delta^l[t] \, \alpha^l[t]$

where spatial error $\delta^l[t]$ and temporal accumulator $\alpha^l[t]$ allow state-efficient updates at each time $t$ . This reduces state to $\mathcal{O}(n^2L+nL)$ , but prior schemes still require latent continuous weights, underutilizing the binary, memory-light potential of BSNNs.

The key innovation of BSO is the complete elimination of latent weights during training, substituting direct binary weight updates controlled by a real-time flip-signal procedure.

2. Flip-Signal Mechanism in BSO

At each training iteration (layer $l$ , time $t$ ), BSO computes the instantaneous gradient,

$G^l[t] = \delta^l[t] \, \alpha^l[t]$

and maintains a first-order momentum estimator,

$M^l[t] = \beta M^l[t-1] + (1-\beta) G^l[t], \quad 0 < \beta < 1.$

A flip-signal is generated for weight $W^l_{ij}$ if the product $W^l_{ij} M^l_{ij}[t]$ exceeds a threshold $y>0$ (in magnitude):

If $W$ and $M$ agree and $|W M| > y$ , set $W \leftarrow +1$ ;
If $W$ and $M$ strongly disagree, $W \leftarrow -1$ ;
Otherwise, leave $W$ unchanged.

Equivalent weight-update rules:

Signed mask for flips:

$F^l[t]_{ij} = \mathbf{1}(W^l_{ij} M^l_{ij}[t] > y) - \mathbf{1}(W^l_{ij} M^l_{ij}[t] < -y)$

Weight update:

$W^l[t+1] = -\mathrm{sign}(W^l[t] \odot M^l[t] - y) \odot W^l[t]$

This process enables online updates of purely binary weights, requiring storage only for the sign values and a single low-precision momentum buffer. All latent real-valued weights are eliminated, dramatically reducing the training memory footprint.

3. Temporal-Aware BSO (T-BSO): Adapting to Temporal Nonstationarity

Spiking networks display highly nonstationary gradient statistics across time. T-BSO introduces temporal adaptivity by employing a second-order moment estimator to dynamically adjust the flip threshold:

First moment (momentum):

$M^l[t] = \beta_1 M^l[t-1] + (1-\beta_1) G^l[t]$

Second moment (variance estimate):

$v^l[t] = \beta_2 v^l[t-1] + (1-\beta_2) \mathrm{mean}(G^l[t] \odot G^l[t])$

Adaptive temporal threshold:

$y^l[t] = v^l[t] + \varepsilon$

where $\varepsilon \ll 1$ .

Weight update:

$W^l[t+1] = -\mathrm{sign}(W^l[t] \odot M^l[t] - y^l[t]) \odot W^l[t]$

This adaptive thresholding leads T-BSO to perform more frequent sign flips when variance is low and more conservative updates when variance is high, directly addressing the temporal heterogeneity of BSNN optimization. Pseudocode per layer is given in Algorithm 1 of the source paper (Liang et al., 16 Nov 2025).

4. Theoretical Regret Guarantees

Regret analysis is carried out under the classical online convex optimization framework. For convex loss functions $f_t$ over a bounded domain $\mathcal{X} \subset \mathbb{R}^d$ with bounded gradients, and decaying per-step flip thresholds and momentum decay rates,

$||\nabla f_t(w)||_2 \le G,\qquad ||\nabla f_t(w)||_{\infty} \le G_{\infty}$

defining regret as

$R(T) = \sum_{t=1}^T f_t(w_t) - \min_{w \in \mathcal{X}} \sum_{t=1}^T f_t(w),$

the following guarantees are established:

Algorithm	Regret Bound	Regret Rate
BSO	$\mathcal{O}(\sqrt{T})$	$R(T)/T \to 0$
T-BSO	$\mathcal{O}(T^{3/4})$	$R(T)/T \to 0$

Proofs leverage a convexity-based lower bound and interpret the flip-sign updates as variants of projected gradient descent with sign-restricted steps, bounding the number of flips via the (quasi-)sparse norm of the momentum buffer and the decay of thresholds. Complete derivations are provided in Appendix A of the source (Liang et al., 16 Nov 2025).

5. Empirical Results and Benchmarking

The empirical evaluation spans static vision, neuromorphic, and keyword-spotting audio tasks:

Datasets and architectures: CIFAR-10/100 and ImageNet-1K (VGG-11/16, ResNet-18); CIFAR10-DVS (neuromorphic, $T=10$ ); Google Speech Commands (VGG-11, $T=4$ ).
Baselines: Online training (OTTT, NDOT, SLTT), direct BSNN training (CBP, Q-SNN), and full-precision BPTT SNN.
Metrics: Classification accuracy, model size, training memory, time-step scaling.

Key findings include:

T-BSO achieves 94.3–94.9% accuracy on CIFAR-10/VGG-11 with 1.20 MB of binary weights, matching or exceeding state-of-the-art online baselines while remaining fully binary and online.
On CIFAR-100, T-BSO (VGG-11, $T=6$ ) attains 74.3%, surpassing Q-SNN by approximately 1%.
On ImageNet/ResNet-18 ( $T=4$ ), T-BSO reaches 57.8% top-1 accuracy (4.22 MB), competitive with full-precision online baselines.
On CIFAR10-DVS, T-BSO achieves 81.0%, outperforming prior online or binary-weight SNNs.
Google Speech Commands performance is 96.1%, exceeding STS-T and ATIF.

Memory ablation demonstrates that BSO and T-BSO training memory remains effectively constant ( $\approx$ 3 GB) as time-steps $T$ increase, in stark contrast to BPTT, which scales linearly from 2.9 GB ( $T=1$ ) to 58 GB ( $T=35$ ).

Additional ablations show T-BSO consistently outperforms BSO by 0.5–1.0% as $T$ grows; both algorithms exhibit low sensitivity to threshold and momentum hyperparameters (accuracy variations $<$ 1% across wide parameter ranges).

6. Significance and Context Within SNN Optimization

BSO and T-BSO advance the state of BSNN training by enabling genuinely online, memory-efficient optimization without any latent weight storage. This directly addresses the major bottleneck in deploying SNNs on resource-constrained hardware, where model and memory footprints are critical. The combination of deterministic sign-flip updates, momentum tracking, and temporal adaptivity yields provable sublinear regret and empirical performance that closely approaches or exceeds latent-weight-based baselines.

These methods unify the goals of online convex optimization with the unique architectural constraints of binary-weight SNNs, serving as references for future algorithmic design in the field (Liang et al., 16 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

BSO: Binary Spiking Online Optimization Algorithm (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Binary Spiking Online (BSO) Optimization Algorithm.