Optimal Spiking Brain Compression: Improving One-Shot Post-Training Pruning and Quantization for Spiking Neural Networks

Published 4 Jun 2025 in cs.LG and cs.NE | (2506.03996v1)

Abstract: Spiking Neural Networks (SNNs) have emerged as a new generation of energy-efficient neural networks suitable for implementation on neuromorphic hardware. As neuromorphic hardware has limited memory and computing resources, weight pruning and quantization have recently been explored to improve SNNs' efficiency. State-of-the-art SNN pruning/quantization methods employ multiple compression and training iterations, increasing the cost for pre-trained or very large SNNs. In this paper, we propose a new one-shot post-training pruning/quantization framework, Optimal Spiking Brain Compression (OSBC), that adapts the Optimal Brain Compression (OBC) method of [Frantar, Singh, and Alistarh, 2023] for SNNs. Rather than minimizing the loss on neuron input current as OBC does, OSBC achieves more efficient and accurate SNN compression in one pass by minimizing the loss on spiking neuron membrane potential with a small sample dataset. Our experiments on neuromorphic datasets (N-MNIST, CIFAR10-DVS, DVS128-Gesture) demonstrate that OSBC can achieve 97% sparsity through pruning with 1.41%, 10.20%, and 1.74% accuracy loss, or 4-bit symmetric quantization with 0.17%, 1.54%, and 7.71% accuracy loss, respectively. Code will be available on GitHub.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces Optimal Spiking Brain Compression (OSBC), a framework adapting one-shot pruning and quantization for spiking neural networks using membrane potential as a loss function.
OSBC achieves up to 97% sparsity with a minor 1.74% accuracy drop on neuromorphic datasets like DVS128-Gesture, outperforming traditional methods.
The module-wise strategy and one-shot approach significantly enhance energy efficiency, making OSBC highly applicable for edge computing and robotics.

Optimal Spiking Brain Compression: Improving One-Shot Post-Training Pruning and Quantization for Spiking Neural Networks

Introduction

The paper addresses the optimization of Spiking Neural Networks (SNNs) using a new framework called Optimal Spiking Brain Compression (OSBC). With the growing demand for energy-efficient SNNs suitable for neuromorphic hardware, the OSBC framework proposes a one-shot post-training approach to pruning and quantization. This approach is designed to improve efficiency without the extensive computational cost typically associated with iterative compression methods.

Framework Overview

OSBC adapts the Optimal Brain Compression (OBC) approach for Artificial Neural Networks (ANNs) to the domain of SNNs. Unlike OBC, which focuses on minimizing the loss on neuron input current, OSBC targets the spiking neuron membrane potential. This shift is critical for improving the fidelity of SNN compression, particularly in applications using neuromorphic datasets such as N-MNIST, CIFAR10-DVS, and DVS128-Gesture. The proposed method achieves up to 97% sparsity in pruning, maintaining a minimal accuracy loss.

Membrane Potential as an Objective Function

A core innovation in OSBC is using the surrogate membrane potential as a loss function. This method respects the unique properties of SNNs, where neuron spiking is influenced by accumulated membrane potential rather than just input current. The paper demonstrates that this approach has a higher correlation with output spike train fidelity than previous methods.

Visual Representation: Matrix Visualization

The correlation between the objective function's effectiveness and output spike train fidelity is depicted through a series of figures.

Figure 1: M_j matrix visualization. S represents the spike train with spikes at red lines, the bottom square image is the visualization of the corresponding matrix M_j with $d_{time} = 10, \tau_m = 3$ .

Module-Wise Compression

OSBC introduces a module-wise compression strategy, where each module includes linear or convolutional layers followed by a LIF neuron layer and associated operations. This method streamlines the process and reduces redundancy while preserving neuron dependencies.

Experimental Evaluation

Experiments confirm OSBC's superior performance over baseline methods such as naive Magnitude-Based Pruning (MBP) and ExactOBS. The frameworks were tested across multiple neuromorphic datasets, achieving high sparsity levels with minor accuracy losses.

Comparison of Pruning Methods

Figure 2: Per-layer pruning percentage breakdown at 80\%, 90\%, 95\%, 97\%, and 98\% sparsity for three commonly used neuromorphic datasets: a) N-MNIST, b) CIFAR10-DVS, c) DVS128-Gesture.

In pruning, OSBC retained high accuracy at 97% sparsity for datasets like DVS128-Gesture, demonstrating a 1.74% accuracy drop, outperforming existing methods. Quantization results were also favorable, especially at lower bit-widths, where OSBC outshone approaches like GPTQ.

Implications and Future Directions

The advancements in OSBC are promising for the deployment of SNNs in real-time environments such as edge computing and robotics, where power efficiency is critical. Future work could explore temporal compression and enhanced quantization grids, which would further optimize SNN performance.

Conclusion

The OSBC framework presents a streamlined, efficient approach for one-shot pruning and quantization of SNNs. It exhibits notable improvements over state-of-the-art methods, particularly in maintaining accuracy at high sparsity levels. This research paves the way for the scalable deployment of SNNs in energy-constrained scenarios, expanding their applicability and practicality.

Markdown Report Issue