Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spike-driven Transformer (2307.01694v1)

Published 4 Jul 2023 in cs.NE and cs.CV

Abstract: Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.

Spike-Driven Transformer: A Synergy of Energy Efficiency and Performance

The field of neural network architecture has continuously evolved, striving for an optimal balance between computational power, energy efficiency, and task accuracy. This paper introduces a novel architectural innovation, the Spike-driven Transformer, which integrates the spike-driven nature of Spiking Neural Networks (SNNs) into the robust framework of Transformers. This work is a significant contribution to the field, addressing the long-standing challenge of incorporating SNNs' low-power, bio-inspired paradigm into high-performance neural networks like Transformers.

Core Contributions

The Spike-driven Transformer distinguishes itself through four unique properties:

  1. Event-Driven Computation: The spike-driven paradigm ensures that no unnecessary computation is performed when input values are zero, enhancing energy efficiency significantly.
  2. Binary Spike Communication: By leveraging binary spikes for communication, conventional matrix multiplications are transformed into sparse additions, further reducing computational load.
  3. Linear Complexity Self-Attention: The self-attention mechanism, a cornerstone of Transformer efficiency, is adapted to linear complexity across both token and channel dimensions, offering substantial computational savings.
  4. Sparse Operation Between Spike Form Inputs: Operations between spike-form Query, Key, and Value utilize mask and addition, eliminating the need for energy-intensive multiplication.

Central to the Spike-driven Transformer is the novel Spike-Driven Self-Attention (SDSA) mechanism, which eliminates multiplication entirely in favor of computationally efficient mask and addition operations. This adjustment achieves up to 87.2 times lower computational energy than traditional self-attention, exemplifying the model's energy-efficient nature.

Methodological Innovation

A noteworthy aspect of the proposed model is the rearrangement of residual connections prior to activation functions. This ensures the communication between neurons is strictly through binary spikes, aligning closely with the spike-driven paradigm and minimizing energy consumption. With these adjustments, the Spike-driven Transformer achieves a Top-1 accuracy of 77.1% on ImageNet-1K, marking it as a state-of-the-art solution within the SNN category.

Energy Efficiency Analysis

The paper presents an in-depth energy analysis, highlighting the drastic reduction in power consumption possible with the spike-driven paradigm. Traditional Transformer architectures perform computationally expensive Multiply-and-Accumulate (MAC) operations, whereas the Spike-driven Transformer capitalizes on Accumulate (AC) operations, decreasing energy usage simultaneously with maintaining competitive accuracy.

Implications and Future Directions

The implications of this research extend into both practical and theoretical domains. Practically, the Spike-driven Transformer could notably reduce the energy footprint of neural networks deployed on resource-constrained devices. Theoretically, it presents a compelling case for revisiting how bio-inspired principles can inform scalable, efficient network designs.

Future developments in AI could focus on enhancing the compatibility of more complex neural operations with the spike-driven approach, furthering the goal of universally integrating energy efficiency with computational efficacy. Additionally, exploring the deployment of the Spike-driven Transformer on neuromorphic hardware could yield intriguing insights into hardware-optimized deep learning frameworks.

In conclusion, the Spike-driven Transformer represents a significant advancement in low-power neural network design, effectively marrying the energy efficiency of SNNs with the high performance of Transformers. This work not only enriches the toolkit of AI researchers with a novel architectural option but also sets a precedent for future explorations in energy-efficient AI infrastructures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. Wolfgang Maass. Networks of spiking neurons: the third generation of neural network models. Neural Networks, 10(9):1659–1671, 1997.
  2. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 345(6197):668–673, 2014.
  3. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1):82–99, 2018.
  4. Towards artificial general intelligence with hybrid tianjic chip architecture. Nature, 572(7767):106–111, 2019.
  5. Towards spike-based machine intelligence with neuromorphic computing. Nature, 575(7784):607–617, 2019.
  6. Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks. Nature Machine Intelligence, 3(10):905–913, 2021.
  7. Opportunities for neuromorphic computing algorithms and applications. Nature Computational Science, 2(1):10–19, 2022.
  8. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
  9. A survey on vision transformer. IEEE transactions on Pattern Analysis and Machine Intelligence, 45(1):87–110, 2022.
  10. Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s):1–41, 2022.
  11. Spikeformer: A novel architecture for training high-performance low-latency spiking neural network. arXiv preprint arXiv:2211.10686, 2022.
  12. Online transformers with spiking neurons for fast prosthetic hand control. arXiv preprint arXiv:2303.11860, 2023.
  13. Spiking transformers for event-based single object tracking. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 8801–8810, 2022.
  14. Spike transformer: Monocular depth estimation for spiking camera. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII, pages 34–52. Springer, 2022.
  15. Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition. In Proc. Interspeech 2020, pages 5026–5030, 2020.
  16. Spikegpt: Generative pre-trained language model with spiking neural networks. arXiv preprint arXiv:2302.13939, 2023.
  17. Spiking transformer networks: A rate coded approach for processing sequential data. In 2021 7th International Conference on Systems and Informatics (ICSAI), pages 1–5. IEEE, 2021.
  18. Complex dynamic neurons improved spiking transformer network for efficient automatic speech recognition. Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023), 2023.
  19. Event-based human pose tracking by spiking spatiotemporal transformer. arXiv preprint arXiv:2303.09681, 2023.
  20. Spikformer: When spiking neural network meets transformer. In The Eleventh International Conference on Learning Representations, 2023.
  21. Training full spike neural networks via auxiliary accumulation pathway. arXiv preprint arXiv:2301.11929, 2023.
  22. Spikingformer: Spike-driven residual learning for transformer-based spiking neural network. arXiv preprint arXiv:2304.11954, 2023.
  23. Transformers are rnns: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning (ICML), 2020.
  24. Efficient transformers: A survey. ACM Computing Surveys, 55(6):1–28, 2022.
  25. Deep residual learning in spiking neural networks. Advances in Neural Information Processing Systems, 34:21056–21069, 2021.
  26. Advancing spiking neural networks towards deep residual learning. arXiv preprint arXiv:2112.08954, 2021.
  27. Attention spiking neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP:1–18, 01 2023.
  28. Brain inspired computing: A systematic survey and future trends. 2023.
  29. Probabilistic modeling: Proving the lottery ticket hypothesis in spiking neural network. arXiv preprint arXiv:2305.12148, 2023.
  30. Eugene M Izhikevich. Simple model of spiking neurons. IEEE Transactions on Neural Networks, 14(6):1569–1572, 2003.
  31. Self-backpropagation of synaptic modifications elevates the efficiency of spiking and artificial neural networks. Science Advances, 7(43):eabh0146, 2021.
  32. Brain-inspired global-local learning incorporated with neuromorphic computing. Nature Communications, 13(1):65, 2022.
  33. Long short-term memory and learning-to-learn in networks of spiking neurons. Advances in Neural Information Processing Systems, 31, 2018.
  34. A long short-term memory for ai applications in spike-based neuromorphic hardware. Nature Machine Intelligence, 4(5):467–479, 2022.
  35. Temporal-wise attention spiking neural networks for event streams classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10221–10230, 2021.
  36. Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in Neuroscience, 12:331, 2018.
  37. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51–63, 2019.
  38. Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11062–11070, 2021.
  39. Comprehensive snn compression using admm optimization and activity regularization. IEEE Transactions on Neural Networks and Learning Systems, 2021.
  40. Optimal conversion of conventional artificial neural networks to spiking neural networks. In International Conference on Learning Representations, 2021.
  41. Progressive tandem learning for pattern recognition with deep spiking neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7824–7840, 2021.
  42. Direct training for spiking neural networks: Faster, larger, better. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 1311–1318, 2019.
  43. A wafer-scale neuromorphic hardware system for large-scale neural modeling. In 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1947–1950. IEEE, 2010.
  44. Spinnaker: A 1-w 18-core system-on-chip for massively-parallel neural network simulation. IEEE Journal of Solid-state Circuits, 48(8):1943–1953, 2013.
  45. Neurogrid: A mixed-analog-digital multichip system for large-scale neural simulations. Proceedings of the IEEE, 102(5):699–716, 2014.
  46. Darwin: A neuromorphic hardware co-processor based on spiking neural networks. Science China Information Sciences, 59(2):1–5, 2016.
  47. Spiking neural network integrated circuits: A review of trends and future directions. In 2022 IEEE Custom Integrated Circuits Conference (CICC), pages 1–8, 2022.
  48. Event-driven spiking convolutional neural network. WIPO Patent, page WO2020207982A1, 2020.
  49. Improving language understanding by generative pre-training. 2018.
  50. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  51. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representa- tions (ICLR), 2020.
  52. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
  53. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
  54. Efficientformer: Vision transformers at mobilenet speed. Advances in Neural Information Processing Systems, 35:12934–12949, 2022.
  55. Parc-net: Position aware circular convolution with merits from convnets and transformer. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI, pages 613–630. Springer, 2022.
  56. Dynamic spatial sparsification for efficient vision transformers and convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  57. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 558–567, 2021.
  58. Levit: a vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12259–12269, 2021.
  59. Rethinking attention with performers. In International Conference on Learning Representations, 2021.
  60. Are sixteen heads really better than one? Advances in Neural Information Processing Systems, 32, 2019.
  61. Hydra attention: Efficient attention with many heads. In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII, pages 35–49. Springer, 2023.
  62. cosformer: Rethinking softmax in attention. In International Conference on Learning Representations, 2022.
  63. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3531–3539, 2021.
  64. Random feature attention. In International Conference on Learning Representations, 2021.
  65. Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications. arXiv preprint arXiv:2303.15446, 2023.
  66. A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4):500–544, 1952.
  67. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  68. Identity mappings in deep residual networks. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 630–645, Cham, 2016. Springer International Publishing.
  69. GLIF: A unified gated leaky integrate-and-fire neuron for spiking neural networks. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  70. Im-loss: information maximization loss for spiking neural networks. Advances in Neural Information Processing Systems, 35:156–166, 2022.
  71. A comprehensive and modularized statistical framework for gradient norm equality in deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):13–31, 2022.
  72. Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks. In International Conference on Machine Learning, pages 5393–5402. PMLR, 2018.
  73. Deepnet: Scaling transformers to 1,000 layers. arXiv preprint arXiv:2203.00555, 2022.
  74. Attention mechanisms in computer vision: A survey. Computational Visual Media, 8(3):331–368, 2022.
  75. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
  76. Learning multiple layers of features from tiny images. 2009.
  77. Cifar10-dvs: an event-stream dataset for object classification. Frontiers in Neuroscience, 11:309, 2017.
  78. A low power, fully event-based gesture recognition system. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7243–7252, 2017.
  79. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 7132–7141, 2018.
  80. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), pages 3–19, 2018.
  81. Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. arXiv preprint arXiv:2005.01807, 2020.
  82. Temporal efficient training of spiking neural network via gradient re-weighting. In International Conference on Learning Representations, 2022.
  83. Spiking deep residual networks. IEEE Transactions on Neural Networks and Learning Systems, pages 1–6, 2021.
  84. One timestep is all you need: training spiking neural networks with ultra low latency. arXiv preprint arXiv:2110.05929, 2021.
  85. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2661–2671, 2021.
  86. Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), volume 34, pages 23426–23439, 2021.
  87. Training high-performance low-latency spiking neural networks by differentiation on spike representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12444–12453, 2022.
  88. Diet-snn: A low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Transactions on Neural Networks and Learning Systems, pages 1–9, 2021.
  89. Pruning convolutional neural networks for resource efficient inference. In International Conference on Learning Representations, 2017.
  90. Hire-snn: Harnessing the inherent robustness of energy-efficient deep spiking neural networks by training with crafted input noise. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5209–5218, 2021.
  91. Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization. Frontiers in Neuroscience, 14:653, 2020.
  92. Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14. IEEE, 2014.
  93. Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180, 2022.
  94. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Man Yao (18 papers)
  2. Jiakui Hu (11 papers)
  3. Zhaokun Zhou (22 papers)
  4. Li Yuan (141 papers)
  5. Yonghong Tian (184 papers)
  6. Bo Xu (212 papers)
  7. Guoqi Li (90 papers)
Citations (61)
Github Logo Streamline Icon: https://streamlinehq.com