Going Deeper With Directly-Trained Larger Spiking Neural Networks (2011.05280v2)

Published 29 Oct 2020 in cs.NE and cs.AI

Abstract: Spiking neural networks (SNNs) are promising in a bio-plausible coding for spatio-temporal information and event-driven signal processing, which is very suited for energy-efficient implementation in neuromorphic hardware. However, the unique working mode of SNNs makes them more difficult to train than traditional networks. Currently, there are two main routes to explore the training of deep SNNs with high performance. The first is to convert a pre-trained ANN model to its SNN version, which usually requires a long coding window for convergence and cannot exploit the spatio-temporal features during training for solving temporal tasks. The other is to directly train SNNs in the spatio-temporal domain. But due to the binary spike activity of the firing function and the problem of gradient vanishing or explosion, current methods are restricted to shallow architectures and thereby difficult in harnessing large-scale datasets (e.g. ImageNet). To this end, we propose a threshold-dependent batch normalization (tdBN) method based on the emerging spatio-temporal backpropagation, termed "STBP-tdBN", enabling direct training of a very deep SNN and the efficient implementation of its inference on neuromorphic hardware. With the proposed method and elaborated shortcut connection, we significantly extend directly-trained SNNs from a shallow structure ( < 10 layer) to a very deep structure (50 layers). Furthermore, we theoretically analyze the effectiveness of our method based on "Block Dynamical Isometry" theory. Finally, we report superior accuracy results including 93.15 % on CIFAR-10, 67.8 % on DVS-CIFAR10, and 67.05% on ImageNet with very few timesteps. To our best knowledge, it's the first time to explore the directly-trained deep SNNs with high performance on ImageNet.

Authors (5)

Hanle Zheng (3 papers)
Yujie Wu (34 papers)
Lei Deng (81 papers)
Yifan Hu (89 papers)
Guoqi Li (90 papers)

Citations (430)

View on Semantic Scholar

Summary

Direct Training of Deep Spiking Neural Networks Through Threshold-Dependent Batch Normalization

The paper entitled Going Deeper With Directly-Trained Larger Spiking Neural Networks by Hanle Zheng et al. presents a significant advancement in the training methodology of Spiking Neural Networks (SNNs). SNNs stand as a promising class within neural network architectures, particularly lauded for their energy-efficient operation and proficient handling of spatio-temporal data. However, their adoption has lagged due to challenges inherent in their training procedures, which stem from the non-differentiability of spikes and the complexity of backpropagating through temporal dynamics. This research introduces a novel method, termed "STBP-tdBN" (Spatio-Temporal BackPropagation with threshold-dependent Batch Normalization), that enables the effective direct training of deep SNNs, achieving unprecedented performance levels on large-scale datasets such as ImageNet.

The distinctiveness of the proposed solution lies in addressing two principal challenges in SNN training: gradient vanishing/explosion problems and the balancing act of neuron input and firing thresholds. STBP-tdBN integrates a new threshold-dependent batch normalization technique, which stabilizes gradient flow and regulates firing thresholds across the network. This innovation enables the direct training of very deep SNNs, extending the feasible number of layers from under ten to as many as fifty, a milestone that had previously eluded direct-training methodologies.

The theoretical foundation of the paper is rooted in gradient norm theory, particularly leveraging insights from "Block Dynamical Isometry." The authors illustrate that their approach can effectively mitigate the gradient vanishing/explosion issue by maintaining stable gradient norms across the network's depth. They achieve this by scaling the variance of pre-activations in accordance with neuron thresholds, thus preserving input-output signal integrity as the information propagates through layers and timesteps.

Empirical results reported in the paper underscore the efficacy of the proposed method. Notably, the researchers achieved a top-1 accuracy of 67.05% on the ImageNet dataset using a ResNet-34 architecture with only six timesteps. This is a remarkable result considering the computational efficiency of the approach, which leverages the sparse nature of spikes in SNNs to achieve lower computational costs compared to traditional ANNs with equivalent architectures. Furthermore, their SNN architectures attained state-of-the-art performance on neuromorphic datasets such as DVS-Gesture and DVS-CIFAR10, demonstrating the approach's capability to harness temporal-spatial information effectively.

The paper's contributions hold significant implications for both academic research and practical applications. The ability to train deeper SNNs unlocks new potentials for complex tasks requiring efficiency and accuracy, particularly in domains such as automated driving, robotics, and neuromorphic computing, where energy and computation resources are constrained. Moreover, the findings and methodologies presented in this paper could stimulate further exploration into novel SNN architectures and training algorithms, fostering advancements in bio-inspired computing paradigms.

In summary, the paper delivers a well-articulated approach to unlocking the power of deep SNNs through an innovative normalization technique, thereby paving a pathway for more robust and scalable neuromorphic computing solutions. Future research may further refine these techniques and explore their integration into a broader array of neuromorphic systems, potentially advancing the reach and impact of this promising field in artificial intelligence.

PDF Markdown

Related Papers

Find Related Papers