ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer (2402.12820v1)
Abstract: Stochastic computing (SC) has emerged as a promising computing paradigm for neural acceleration. However, how to accelerate the state-of-the-art Vision Transformer (ViT) with SC remains unclear. Unlike convolutional neural networks, ViTs introduce notable compatibility and efficiency challenges because of their nonlinear functions, e.g., softmax and Gaussian Error Linear Units (GELU). In this paper, for the first time, a ViT accelerator based on end-to-end SC, dubbed ASCEND, is proposed. ASCEND co-designs the SC circuits and ViT networks to enable accurate yet efficient acceleration. To overcome the compatibility challenges, ASCEND proposes a novel deterministic SC block for GELU and leverages an SC-friendly iterative approximate algorithm to design an accurate and efficient softmax circuit. To improve inference efficiency, ASCEND develops a two-stage training pipeline to produce accurate low-precision ViTs. With extensive experiments, we show the proposed GELU and softmax blocks achieve 56.3% and 22.6% error reduction compared to existing SC designs, respectively and reduce the area-delay product (ADP) by 5.29x and 12.6x, respectively. Moreover, compared to the baseline low-precision ViTs, ASCEND also achieves significant accuracy improvements on CIFAR10 and CIFAR100.
- A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” Proc. ICLR, 2021.
- N. Carion et al., “End-to-end object detection with transformers,” in Proc. ECCV, 2020.
- J. Gu et al., “Multi-scale high-resolution vision transformer for semantic segmentation,” in Proc. CVPR, 2022.
- Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. ICCV, 2021.
- Y. Zhang et al., “When sorting network meets parallel bitstreams: A fault-tolerant parallel ternary neural network accelerator based on stochastic computing,” in Proc. DATE. IEEE, 2020.
- K. Kim et al., “Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks,” in Proc. DAC, 2016.
- A. Ren et al., “Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing,” in Proc. ASPLOS. ACM, 2017.
- J. Li et al., “Towards acceleration of deep convolutional neural networks using stochastic computing,” in Proc. ASPDAC. IEEE, 2017.
- Z. Li et al., “Heif: Highly efficient stochastic computing-based inference framework for deep neural networks,” IEEE TCAD, vol. 38, no. 8, pp. 1543–1556, 2018.
- Y. Zhang et al., “Accurate and energy-efficient implementation of non-linear adder in parallel stochastic computing using sorting network,” in Proc. ISCAS. IEEE, 2020.
- P. Li et al., “Using stochastic computing to implement digital image processing algorithms,” in Proc. ICCD. IEEE, 2011.
- W. Zeng et al., “Mpcvit: Searching for mpc-friendly vision transformer with heterogeneous attention,” in Proc. ICCV, 2023.
- H. Bai et al., “Binarybert: Pushing the limit of bert quantization,” in Proc. ACL, 2021.
- Z. Liu et al., “Bit: Robustly binarized multi-distilled transformer,” in NeurIPS, 2022.
- Y. Hu et al., “Accurate yet efficient stochastic computing neural acceleration with high precision residual fusion,” in Proc. DATE. IEEE, 2023.
- Z. Yuan et al., “Softmax regression design for stochastic computing based deep convolutional neural networks,” in GLSVLSI, 2017.
- R. Hu et al., “Efficient hardware architecture of softmax layer in deep neural network,” in DSP. IEEE, 2018.
- W. Qian et al., “Uniform approximation and bernstein polynomials with coefficients in the unit interval,” European Journal of Combinatorics, vol. 32, no. 3, pp. 448–463, 2011.
- A. Naderi et al., “Delayed stochastic decoding of ldpc codes,” IEEE Transactions on Signal Processing, vol. 59, 2011.
- T.-H. Chen et al., “Design of division circuits for stochastic computing,” in Proc. ISVLSI. IEEE, 2016, pp. 116–121.
- V. Canals et al., “A new stochastic computing methodology for efficient neural network implementation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 3, pp. 551–564, 2015.
- Q. Zhang et al., “Morse-stf: Improved protocols for privacy-preserving machine learning,” arXiv:2109.11726, 2021.
- B. Martinez et al., “Training binary neural networks with real-to-binary convolutions,” arXiv:2003.11535, 2020.
- A. Hassani et al., “Escaping the big data paradigm with compact transformers,” arXiv:2104.05704, 2021.
- S. K. Esser et al., “Learned step size quantization,” Proc. ICLR, 2020.
- I. Loshchilov et al., “Decoupled weight decay regularization,” Proc. ICLR, 2019.