Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips (2404.03663v1)

Published 15 Feb 2024 in cs.NE and cs.CV

Abstract: Neuromorphic computing, which exploits Spiking Neural Networks (SNNs) on neuromorphic chips, is a promising energy-efficient alternative to traditional AI. CNN-based SNNs are the current mainstream of neuromorphic computing. By contrast, no neuromorphic chips are designed especially for Transformer-based SNNs, which have just emerged, and their performance is only on par with CNN-based SNNs, offering no distinct advantage. In this work, we propose a general Transformer-based SNN architecture, termed as ``Meta-SpikeFormer", whose goals are: 1) Lower-power, supports the spike-driven paradigm that there is only sparse addition in the network; 2) Versatility, handles various vision tasks; 3) High-performance, shows overwhelming performance advantages over CNN-based SNNs; 4) Meta-architecture, provides inspiration for future next-generation Transformer-based neuromorphic chip designs. Specifically, we extend the Spike-driven Transformer in \citet{yao2023spike} into a meta architecture, and explore the impact of structure, spike-driven self-attention, and skip connection on its performance. On ImageNet-1K, Meta-SpikeFormer achieves 80.0\% top-1 accuracy (55M), surpassing the current state-of-the-art (SOTA) SNN baselines (66M) by 3.7\%. This is the first direct training SNN backbone that can simultaneously supports classification, detection, and segmentation, obtaining SOTA results in SNNs. Finally, we discuss the inspiration of the meta SNN architecture for neuromorphic chip design. Source code and models are available at \url{https://github.com/BICLab/Spike-Driven-Transformer-V2}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Is space-time attention all you need for video understanding? In ICML, number 3, pp.  4, 2021.
  2. Hydra attention: Efficient attention with many heads. In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII, pp.  35–49. Springer, 2023.
  3. Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
  4. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2017.
  5. François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  1251–1258, 2017.
  6. Twins: Revisiting the design of spatial attention in vision transformers. Advances in Neural Information Processing Systems, 34:9355–9366, 2021.
  7. MMSegmentation Contributors. Mmsegmentation: Openmmlab semantic segmentation toolbox and benchmark, 2020. URL https://github.com/open-mmlab/mmsegmentation.
  8. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1):82–99, 2018.
  9. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255. IEEE, 2009.
  10. Optimal conversion of conventional artificial neural networks to spiking neural networks. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=FZ1oTwcXchK.
  11. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13733–13742, 2021.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  13. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88:303–338, 2010.
  14. Deep residual learning in spiking neural networks. Advances in Neural Information Processing Systems, 34:21056–21069, 2021.
  15. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  6202–6211, 2019.
  16. Bottom-up and top-down approaches for the design of neuromorphic processing systems: Tradeoffs and synergies between natural and artificial intelligence. Proceedings of the IEEE, 2023.
  17. Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180, 2022. doi: 10.1109/TPAMI.2020.3008413.
  18. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.  249–256. JMLR Workshop and Conference Proceedings, 2010.
  19. Efficient token mixing for transformers via adaptive fourier neural operators. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=EXHG-A3jlM.
  20. Cmt: Convolutional neural networks meet vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12175–12185, 2022.
  21. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):87–110, 2022.
  22. Complex dynamic neurons improved spiking transformer network for efficient automatic speech recognition. Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023), 2023.
  23. Fastervit: Fast vision transformers with hierarchical attention. arXiv preprint arXiv:2306.06189, 2023.
  24. Identity mappings in deep residual networks. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (eds.), Computer Vision – ECCV 2016, pp.  630–645, Cham, 2016a. Springer International Publishing.
  25. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  770–778, 2016b.
  26. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pp.  2961–2969, 2017.
  27. The spinnaker 2 processing element architecture for hybrid digital neuromorphic computing. arXiv preprint arXiv:2103.08392, 2021.
  28. Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp.  10–14. IEEE, 2014.
  29. Fast-snn: Fast spiking neural network by converting quantized ann. arXiv preprint arXiv:2305.19868, 2023.
  30. Advancing spiking neural networks toward deep residual learning. IEEE Transactions on Neural Networks and Learning Systems, pp.  1–15, 2024.
  31. Neuromorphic vision sensors. Science, 288(5469):1189–1190, 2000.
  32. Transformers are rnns: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning (ICML), 2020. URL https://fleuret.org/papers/katharopoulos-et-al-icml2020.pdf.
  33. Spiking-yolo: Spiking neural network for energy-efficient object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 34:11270–11277, Apr. 2020.
  34. Beyond classification: Directly training spiking neural networks for semantic segmentation. Neuromorphic Computing and Engineering, 2(4):044015, 2022.
  35. Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6399–6408, 2019.
  36. Online transformers with spiking neurons for fast prosthetic hand control. arXiv preprint arXiv:2303.11860, 2023.
  37. Brain inspired computing: A systematic survey and future trends. 2023.
  38. Spike calibration: Fast and accurate conversion of spiking neural network for object detection and segmentation. arXiv preprint arXiv:2207.02702, 2022.
  39. A 128×\times× 128 120 db 15 μ𝜇\muitalic_μs latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2):566–576, 2008.
  40. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer, 2014.
  41. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  2117–2125, 2017.
  42. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10012–10022, 2021.
  43. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  3431–3440, 2015.
  44. Neuromorphic computing chip with spatiotemporal elasticity for multi-intelligent-tasking robots. Science Robotics, 7(67):eabk2948, 2022.
  45. Wolfgang Maass. Networks of spiking neurons: The third generation of neural network models. Neural Networks, 10(9):1659–1671, 1997a.
  46. Wolfgang Maass. Networks of spiking neurons: the third generation of neural network models. Neural Networks, 10(9):1659–1671, 1997b.
  47. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 345(6197):668–673, 2014.
  48. Pruning convolutional neural networks for resource efficient inference. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJGCiw5gl.
  49. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51–63, 2019.
  50. Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization. Frontiers in Neuroscience, 14:653, 2020.
  51. Towards artificial general intelligence with hybrid tianjic chip architecture. Nature, 572(7767):106–111, 2019.
  52. A long short-term memory for ai applications in spike-based neuromorphic hardware. Nature Machine Intelligence, 4(5):467–479, 2022.
  53. Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=B1xSperKvH.
  54. Towards spike-based machine intelligence with neuromorphic computing. Nature, 575(7784):607–617, 2019.
  55. Learning representations by back-propagating errors. Nature, 323:533–536, 1986.
  56. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  4510–4520, 2018.
  57. Opportunities for neuromorphic computing algorithms and applications. Nature Computational Science, 2(1):10–19, 2022.
  58. Darwin: A neuromorphic hardware co-processor based on spiking neural networks. Science China Information Sciences, 59(2):1–5, 2016.
  59. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR), pp.  1–14, San Diego, CA, United states, 2015.
  60. Deep directly-trained spiking neural networks for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  6555–6565, 2023.
  61. Mlp-mixer: An all-mlp architecture for vision. Advances in Neural Information Processing Systems, 34:24261–24272, 2021.
  62. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pp. 10347–10357. PMLR, 2021.
  63. Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008, 2017.
  64. Kronecker cp decomposition with fast multiplication for compressing rnns. IEEE Transactions on Neural Networks and Learning Systems, 34(5):2205–2219, 2023a.
  65. Riformer: Keep your vision backbone effective but removing token mixer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14443–14452, 2023b.
  66. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  568–578, 2021a.
  67. Hardvs: Revisiting human activity recognition with dynamic vision sensors. arXiv preprint arXiv:2211.09648, 2022.
  68. Action-net: Multipath excitation for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13214–13223, 2021b.
  69. Resnet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2110.00476, 2021.
  70. Progressive tandem learning for pattern recognition with deep spiking neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7824–7840, 2021.
  71. Efficient visual recognition: A survey on recent advances and brain-inspired methodologies. Machine Intelligence Research, 19(5):366–411, 2022.
  72. Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in Neuroscience, 12:331, 2018.
  73. Early convolutions help transformers see better. Advances in Neural Information Processing Systems, 34:30392–30400, 2021.
  74. Evo-vit: Slow-fast token evolution for dynamic vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  2964–2972, 2022.
  75. Lead federated neuromorphic learning for wireless edge artificial intelligence. Nature Communications, 13(1):4269, 2022.
  76. Temporal-wise attention spiking neural networks for event streams classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10221–10230, 2021.
  77. Inherent redundancy in spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  16924–16934, 2023a.
  78. Spike-driven transformer. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b. URL https://openreview.net/forum?id=9FmolyOHi5.
  79. Sparser spiking activity can be better: Feature refine-and-mask spiking neural network for event-based visual recognition. Neural Networks, 166:410–423, 2023c.
  80. Attention spiking neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8):9393–9410, 2023d.
  81. Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks. Nature Machine Intelligence, 3(10):905–913, 2021.
  82. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10819–10829, 2022a.
  83. Metaformer baselines for vision. arXiv preprint arXiv:2210.13452, 2022b.
  84. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  558–567, 2021.
  85. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2736–2746, 2022a.
  86. Direct training high-performance spiking neural networks for object recognition and detection. Frontiers in Neuroscience, 17, 2023.
  87. Spiking transformers for event-based single object tracking. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp.  8801–8810, 2022b.
  88. Spike transformer: Monocular depth estimation for spiking camera. In European Conference on Computer Vision, pp.  34–52. Springer, 2022c.
  89. Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  11062–11070, 2021.
  90. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  633–641, 2017.
  91. Spikformer: When spiking neural network meets transformer. In The Eleventh International Conference on Learning Representations, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Man Yao (18 papers)
  2. Jiakui Hu (11 papers)
  3. Tianxiang Hu (13 papers)
  4. Yifan Xu (92 papers)
  5. Zhaokun Zhou (22 papers)
  6. Yonghong Tian (184 papers)
  7. Bo Xu (212 papers)
  8. Guoqi Li (90 papers)
Citations (29)
X Twitter Logo Streamline Icon: https://streamlinehq.com