Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting (2202.11946v3)

Published 24 Feb 2022 in cs.NE and cs.AI

Abstract: Recently, brain-inspired spiking neuron networks (SNNs) have attracted widespread research interest because of their event-driven and energy-efficient characteristics. Still, it is difficult to efficiently train deep SNNs due to the non-differentiability of its activation function, which disables the typically used gradient descent approaches for traditional artificial neural networks (ANNs). Although the adoption of surrogate gradient (SG) formally allows for the back-propagation of losses, the discrete spiking mechanism actually differentiates the loss landscape of SNNs from that of ANNs, failing the surrogate gradient methods to achieve comparable accuracy as for ANNs. In this paper, we first analyze why the current direct training approach with surrogate gradient results in SNNs with poor generalizability. Then we introduce the temporal efficient training (TET) approach to compensate for the loss of momentum in the gradient descent with SG so that the training process can converge into flatter minima with better generalizability. Meanwhile, we demonstrate that TET improves the temporal scalability of SNN and induces a temporal inheritable training for acceleration. Our method consistently outperforms the SOTA on all reported mainstream datasets, including CIFAR-10/100 and ImageNet. Remarkably on DVS-CIFAR10, we obtained 83$\%$ top-1 accuracy, over 10$\%$ improvement compared to existing state of the art. Codes are available at \url{https://github.com/Gus-Lab/temporal_efficient_training}.

Citations (208)

View on Semantic Scholar

Summary

The paper introduces the Temporal Efficient Training (TET) approach, improving generalization and accuracy over traditional surrogate gradient methods.
It proposes Temporal Inheritable Training (TIT) to reduce training epochs by using shorter simulation lengths without sacrificing performance.
Empirical evaluations on datasets like CIFAR-10/100, ImageNet, and DVS-CIFAR10 demonstrate a 10% accuracy boost and enhanced energy efficiency.

Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting

The paper "Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting" addresses the inherent challenges in training deep Spiking Neural Networks (SNNs), which are becoming increasingly significant due to their energy-efficient operations and potential application in neuromorphic hardware. Despite their advantages, SNNs face significant obstacles, particularly regarding the non-differentiability of their activation functions, which complicates their training using traditional gradient descent methods utilized for Artificial Neural Networks (ANNs).

Key Contributions

Temporal Efficient Training (TET) Approach: The authors introduce a novel Temporal Efficient Training (TET) mechanism designed to overcome the inefficiencies of conventional Surrogate Gradient (SG) methods in training SNNs. They point out that while SG methods allow backward propagation in SNNs, they often fail to reach the same generalization and accuracy levels as similar processes in ANNs. The TET methodology aims to form flatter minima during the training process, enhancing the model's generalizability.
Temporal Scalability and Inheritable Training: TET is proposed to enhance the temporal scalability of SNNs, concurrently fostering a training method termed Temporal Inheritable Training (TIT). This process accelerates training by employing networks initialized with smaller simulation lengths, thus significantly reducing training epochs while maintaining performance.
Empirical Validation: Through extensive experimentation on mainstream datasets like CIFAR-10/100 and ImageNet, and particularly on the neuromorphic dataset DVS-CIFAR10, the proposed method demonstrates significant performance gains. Notably, the algorithm achieves a 10% accuracy increase over the state-of-the-art on the DVS-CIFAR10 dataset.

Detailed Analysis and Impact

The paper systematically dissects the limitations of current training approaches for SNNs, emphasizing that the distinctive nature of spiking activation functions presents unique challenges in gradient landscape navigation. Traditional uses of SGs do not adequately align with the discrete spiking mechanisms of SNNs, leading to suboptimal minima. The TET approach underscores optimizing each moment's pre-synaptic output as opposed to just integrated potential, thus mitigating the momentum loss that traditionally traps models in sharp local minima.

The TET approach not only improves predictive performance across various datasets but also enhances the operational efficiency of SNNs. The authors highlight the potential for reduced energy consumption during inference, a critical advantage in deploying SNNs in resource-constrained environments like edge computing and IoT devices.

The theoretical implications of the TET methodology extend to new avenues in AI research, particularly in bridging the gap between biological inspiration and computational implementation in neural networks. By confronting the challenge of non-differentiable functions directly, this work contributes to the broader endeavor of improving learning paradigms for unconventional neural architectures.

Future Directions

The implications for future research are profound. The methodological advances introduced by TET open doors for the development of even deeper and more complex SNNs, possibly leading to breakthroughs in neuromorphic computing which more closely emulate natural brain functions. Additionally, the TIT method's reduced training time may catalyze more rapid iterations of SNN design and application, broadening the applicability of SNNs across different sectors.

A potential direction could involve integrating TET with cutting-edge neuromorphic chips to optimize real-world applications in robotics and artificial intelligence. Moreover, exploring the interplay between network architecture and training dynamics in SNNs could yield new insights into efficiency and performance optimization. The paper sets a solid foundation for future exploration in large-scale, biologically inspired computing paradigms.

PDF Markdown