SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks (2403.14302v2)
Abstract: The remarkable success of Vision Transformers in Artificial Neural Networks (ANNs) has led to a growing interest in incorporating the self-attention mechanism and transformer-based architecture into Spiking Neural Networks (SNNs). While existing methods propose spiking self-attention mechanisms that are compatible with SNNs, they lack reasonable scaling methods, and the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting local features. To address these challenges, we propose a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) with a reasonable scaling method. Based on DSSA, we propose a novel spiking Vision Transformer architecture called SpikingResformer, which combines the ResNet-based multi-stage architecture with our proposed DSSA to improve both performance and energy efficiency while reducing parameters. Experimental results show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption than other spiking Vision Transformer counterparts. Notably, our SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, which is the state-of-the-art result in the SNN field.
- A low power, fully event-based gesture recognition system. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7243–7252, 2017.
- Optimal ANN-SNN conversion for high-accuracy and ultra-low-latency spiking neural networks. In Proceedings of the International Conference on Learning Representations, 2023.
- ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- Optimal conversion of conventional artificial neural networks to spiking neural networks. In Proceedings of the International Conference on Learning Representations, 2021.
- Temporal efficient training of spiking neural network via gradient re-weighting. In Proceedings of the International Conference on Learning Representations, pages 1–15, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations, pages 1–22, 2021.
- Deep residual learning in spiking neural networks. In Advances in Neural Information Processing Systems, pages 21056–21069, 2021a.
- Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2661–2671, 2021b.
- The spiNNaker project. Proceedings of the IEEE, 102(5):652–665, 2014.
- Membrane potential batch normalization for spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19420–19430, 2023.
- Reducing ANN-SNN conversion error through residual membrane potential. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11–21, 2023.
- Spiking deep residual networks. IEEE Transactions on Neural Networks and Learning Systems, 34(8):5200–5205, 2021.
- Fast-snn: Fast spiking neural network by converting quantized ann. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12):14546–14562, 2023.
- A unified optimization framework of ANN-SNN conversion: Towards optimal mapping from activation values to firing rates. In Proceedings of the International Conference on Machine Learning, pages 14945–14974, 2023.
- Spiking-YOLO: spiking neural network for energy-efficient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11270–11277, 2020.
- Revisiting batch normalization for training low-latency deep spiking neural networks from scratch. Frontiers in Neuroscience, 15:773954, 2021.
- Spikeseg: Spiking segmentation via STDP saliency mapping. In Proceedings of the International Joint Conference on Neural Networks, pages 1–8, 2020.
- Learning multiple layers of features from tiny images. 2009.
- Efficient converted spiking neural network for 3d and 2d classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9211–9220, 2023.
- Enabling spike-based backpropagation for training deep neural network architectures. Frontiers in Neuroscience, 14:119, 2020.
- Unleashing the potential of spiking neural networks by dynamic confidence. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13350–13360, 2023.
- Cifar10-dvs: an event-stream dataset for object classification. Frontiers in Neuroscience, 11:309, 2017.
- A free lunch from ANN: Towards efficient, accurate spiking neural networks calibration. In Proceedings of the International Conference on Machine Learning, pages 6316–6325, 2021a.
- Differentiable spike: Rethinking gradient-descent for training spiking neural networks. Advances in Neural Information Processing Systems, 34:23426–23439, 2021b.
- Neuromorphic data augmentation for training spiking neural networks. In Proceedings of the European Conference on Computer Vision, pages 631–649, 2022.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- Wolfgang Maass. Networks of spiking neurons: the third generation of neural network models. Neural networks, 10(9):1659–1671, 1997.
- Towards memory-and time-efficient backpropagation for training spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6166–6176, 2023.
- A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 345(6197):668–673, 2014.
- Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51–63, 2019.
- A spiking neural network for image segmentation. arXiv preprint arXiv:2106.08921, pages 1–25, 2021.
- Towards artificial general intelligence with hybrid tianjic chip architecture. Nature, 572(7767):106–111, 2019.
- A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963, pages 1–88, 2017.
- Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in Neuroscience, 13:95, 2019.
- Deep directly-trained spiking neural networks for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6555–6565, 2023.
- Attention is all you need. Advances in neural information processing systems, 30:1–11, 2017.
- SSTFormer: Bridging spiking neural network and memory support transformer for frame-event based recognition. arXiv preprint arXiv:2308.04369, pages 1–12, 2023.
- Deep spiking neural networks with binary weights for object recognition. IEEE Transactions on Cognitive and Developmental Systems, 13(3):514–523, 2020.
- Temporal-coded spiking neural networks with dynamic firing threshold: Learning with event-driven backpropagation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10552–10562, 2023.
- Progressive tandem learning for pattern recognition with deep spiking neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7824–7840, 2021.
- Spike-driven transformer. In Advances in neural information processing systems, pages 1–20, 2023.
- The remarkable robustness of surrogate gradient learning for instilling complex function in spiking neural networks. Neural computation, 33(4):899–925, 2021.
- Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11062–11070, 2021.
- Spikingformer: Spike-driven residual learning for transformer-based spiking neural network. arXiv preprint arXiv:2304.11954, pages 1–16, 2023a.
- Spikformer: When spiking neural network meets transformer. In Proceedings of the International Conference on Learning Representations, pages 1–17, 2023b.
- Event-based human pose tracking by spiking spatiotemporal transformer. arXiv preprint arXiv:2303.09681, pages 1–12, 2023.
- Xinyu Shi (16 papers)
- Zecheng Hao (13 papers)
- Zhaofei Yu (61 papers)