A Principled Bayesian Framework for Training Binary and Spiking Neural Networks

Published 23 May 2025 in cs.LG | (2505.17962v1)

Abstract: We propose a Bayesian framework for training binary and spiking neural networks that achieves state-of-the-art performance without normalisation layers. Unlike commonly used surrogate gradient methods -- often heuristic and sensitive to hyperparameter choices -- our approach is grounded in a probabilistic model of noisy binary networks, enabling fully end-to-end gradient-based optimisation. We introduce importance-weighted straight-through (IW-ST) estimators, a unified class generalising straight-through and relaxation-based estimators. We characterise the bias-variance trade-off in this family and derive a bias-minimising objective implemented via an auxiliary loss. Building on this, we introduce Spiking Bayesian Neural Networks (SBNNs), a variational inference framework that uses posterior noise to train Binary and Spiking Neural Networks with IW-ST. This Bayesian approach minimises gradient bias, regularises parameters, and introduces dropout-like noise. By linking low-bias conditions, vanishing gradients, and the KL term, we enable training of deep residual networks without normalisation. Experiments on CIFAR-10, DVS Gesture, and SHD show our method matches or exceeds existing approaches without normalisation or hand-tuned gradients.

Abstract PDF Upgrade to Chat

Summary

Bayesian Framework for Training Binary and Spiking Neural Networks

The paper presents a principled Bayesian framework for training binary neural networks (BNNs) and spiking neural networks (SNNs), which are known for their energy-efficient and biologically inspired architecture. This framework addresses the limitations of traditional surrogate gradient (SG) methods, which often lack theoretical grounding and depend on manual tuning of hyperparameters. Instead, the authors propose a probabilistic approach to enhance gradient-based optimization of these non-differentiable network types without the need for normalization layers.

Key Features of the Bayesian Framework

The novel framework is built on several fundamental contributions:

Importance-Weighted Straight-Through (IW-ST) Estimators: The authors introduce the IW-ST estimators—a unified class that encompasses straight-through and continuous relaxation-based estimators. By characterizing the bias-variance trade-off inherent to these estimators, they derive a bias-minimizing objective that can be implemented using an auxiliary loss. This development represents a step towards more rigorous gradient estimation in noisy binary networks.
Spiking Bayesian Neural Networks (SBNNs): The paper extends the Bayesian framework to spiking neural networks. By leveraging variational inference, SBNNs utilize posterior noise to train both BNNs and SNNs. This approach minimizes gradient bias, regularizes parameters, and introduces dropout-like noise, effectively enabling deep residual networks to be trained without normalization layers.
KL Divergence Term: A key structural component within their Bayesian methodology employs the KL divergence term in the variational objective. This term encourages a noise level that mitigates vanishing gradients and reduces estimator bias without traditional normalization techniques.

Experimental Results and Implications

The framework was experimentally evaluated on benchmarks such as CIFAR-10, DVS Gesture, and SHD datasets. The results demonstrated that the proposed method matches or exceeds the performance of existing SG-based methods without relying on normalization or hand-tuned surrogates. These results underscore the potential utility of Bayesian noise as a powerful tool for training discrete and spiking networks.

The implications of this research are both practical and theoretical. Practically, it provides a viable pathway to optimize BNNs and SNNs efficiently, particularly in resource-constrained settings. Theoretically, the paper establishes a connection between the mitigation of vanishing gradients, low-bias conditions, and the KL term, contributing to the advancement of probabilistic models in neural network training.

Future Directions

The paper outlines several avenues for future research that could further bolster the framework's applicability. These include developments in variance-bias trade-offs for surrogate gradient (SG), refinement of noise-free gradient estimators, and exploration of fully Bayesian mechanisms for regularization and normalization. Such efforts could pave the way for more robust and efficient models capable of leveraging the intrinsic benefits of binary and spiking architectures.

Overall, this research advances the training paradigm for BNNs and SNNs, setting the stage for future innovations in Bayesian neural network methodologies.