Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization (2208.05163v1)
Abstract: Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, this is the first FPGA-based ViT acceleration framework exploring model quantization. Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0.47% to 1.36% higher Top-1 accuracy under the same bit-width. Compared with the 32-bit floating-point baseline FPGA accelerator, our accelerator achieves around 5.6x improvement on the frame rate (i.e., 56.8 FPS vs. 10.0 FPS) with 0.71% accuracy drop on ImageNet dataset for DeiT-base.
- Zhengang Li (31 papers)
- Mengshu Sun (41 papers)
- Alec Lu (4 papers)
- Haoyu Ma (45 papers)
- Geng Yuan (58 papers)
- Yanyue Xie (12 papers)
- Hao Tang (378 papers)
- Yanyu Li (31 papers)
- Miriam Leeser (10 papers)
- Zhangyang Wang (374 papers)
- Xue Lin (92 papers)
- Zhenman Fang (21 papers)