Deep Residual Learning in Spiking Neural Networks
This paper presents a refined approach to training deep Spiking Neural Networks (SNNs) by introducing a novel architecture called Spike-Element-Wise (SEW) ResNet. The SEW ResNet addresses significant limitations in previous Spiking ResNet models, which adopted the architecture of Artificial Neural Networks (ANNs) but failed to effectively implement identity mapping due to the discrete and complex nature of SNN activations. The SEW ResNet aligns with the principles of residual learning, originally demonstrated by He et al. (2015), improving the training efficiency and accuracy of SNNs on complex datasets.
Analysis of Spiking ResNet Drawbacks
The paper identifies two critical drawbacks in the traditional Spiking ResNet architecture. First, it elucidates that Spiking ResNets struggle to achieve identity mapping across all neuron models because they rely on discrete spike activations, unlike the continuous outputs managed by ReLU in ANNs. This oversight leads to deeper networks performing poorly (degradation problem) due to an inability to maintain identity mapping as the depth increases.
Second, Spiking ResNets exhibit vanishing/exploding gradient problems, attributed to the compounding effect of gradients across layers in deep networks. The binary nature of spikes further complicates gradient calculations, as gradient magnitudes can cascade into extreme values or dissipate entirely with depth—a challenge exacerbated by the discrete nature of spike-based neuronal dynamics.
Introduction of the SEW ResNet
The authors propose the SEW ResNet to counter these issues. This architecture facilitates residual learning in SNNs by leveraging element-wise functions—ADD, AND, and IAND—that process spikes without degrading into non-informative signals. Each function allows the network to engage effectively with identity mapping, thereby stabilizing gradient propagation across layers and mitigating the gradient-related issues inherent in prior models. Notably, element-wise addition (ADD) permits quick transmission and adjustment of gradient magnitudes, which is verified experimentally to offer robust gradient flow even in deeply stacked layers.
Empirical Evaluation
Empirical results show that SEW ResNets outperform existing directly trained SNNs in both classification accuracy and reduced simulation time. On the ImageNet dataset, SEW ResNets demonstrated improved training accuracy and reduced degradation compared to traditional Spiking ResNets, particularly in deeper architectures involving more than 100 layers. This performance is attributed to the successful residual learning enabled by SEW Blocks, which act as enhancers of gradient flow through effective identity mapping.
Furthermore, on neuromorphic datasets like the DVS Gesture and CIFAR10-DVS, SEW ResNets again surpassed the accuracy of state-of-the-art SNN methods, achieving higher performance with fewer parameters and simulation steps. This result underscores the efficiency of SEW blocks in handling temporal data characterized by event-driven patterns, which is a prominent advantage of SNNs over ANNs.
Implications and Future Directions
This research has profound implications for advancing SNN architecture, bridging the performance gap between SNNs and ANNs in complex tasks. SEW ResNet offers a viable path towards scalable deep learning in spiking scenarios, promoting further exploration of biologically inspired computational models with lower energy consumption than standard ANNs.
The SEW approach sets a precedent for future research on residual learning paradigms in SNNs. By showing that SNNs benefit from deep residual frameworks, similar to their ANN counterparts, this work potentially paves the way for advanced applications of SNNs in both static and dynamic data processing, from neuromorphic computing to energy-efficient AI.
In conclusion, the advancement detailed in the paper could drive forward the development of neuromorphic computing and offer a foundation upon which further innovative SNN architectures can be built, exploring more aggressive or specialized residual linkage configurations.