QKFormer: Hierarchical Spiking Transformer using Q-K Attention (2403.16552v2)
Abstract: Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer
- A low power, fully event-based gesture recognition system. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7243–7252, 2017.
- Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks. In International Conference on Learning Representations (ICLR), 2021.
- Anthony N Burkitt. A review of the integrate-and-fire neuron model: I. homogeneous synaptic input. Biological cybernetics, 95:1–19, 2006.
- Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020.
- Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representa- tions (ICLR), 2020.
- Deep Residual Learning in Spiking Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), volume 34, pages 21056–21069, 2021.
- Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2661–2671, October 2021.
- Bridging the gap between anns and snns by calibrating offset spikes. ArXiv, abs/2302.10685, 2023.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Advancing residual learning towards powerful deep spiking neural networks. arXiv preprint arXiv:2112.08954, 2021.
- Deep networks with stochastic depth. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 646–661. Springer, 2016.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.
- Cifar10-dvs: an event-stream dataset for object classification. Frontiers in neuroscience, 11:309, 2017.
- A free lunch from ann: Towards efficient, accurate spiking neural networks calibration. ArXiv, abs/2106.06984, 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021.
- Wolfgang Maass. Networks of spiking neurons: the third generation of neural network models. Neural networks, 10(9):1659–1671, 1997.
- Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51–63, 2019.
- Towards artificial general intelligence with hybrid tianjic chip architecture. Nature, 572(7767):106–111, 2019.
- Event-driven spiking convolutional neural network, June 16 2022. US Patent App. 17/601,939.
- Towards spike-based machine intelligence with neuromorphic computing. Nature, 575(7784):607–617, 2019.
- Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
- Attention is all you need. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), volume 30, 2017.
- Segregation, integration, and balance of large-scale resting brain networks configure different cognitive abilities. Proceedings of the National Academy of Sciences, 118(23):e2022288118, 2021.
- Spatial-temporal self-attention for asynchronous spiking neural networks. In Edith Elkind, editor, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 3085–3093. International Joint Conferences on Artificial Intelligence Organization, 8 2023. Main Track.
- Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in neuroscience, 12:331, 2018.
- Training feedback spiking neural networks by implicit differentiation on the equilibrium state. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), volume 34, pages 14516–14528, 2021.
- Spike-driven transformer, 2023.
- Going Deeper With Directly-Trained Larger Spiking Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 11062–11070, 2021.
- Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 13001–13008, 2020.
- Spikingformer: Spike-driven residual learning for transformer-based spiking neural network, 2023.
- Enhancing the performance of transformer-based spiking neural networks by improved downsampling with precise gradient backpropagation, 2023.
- Spikformer: When spiking neural network meets transformer. In The Eleventh International Conference on Learning Representations, 2023.