Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faster Inference of Integer SWIN Transformer by Removing the GELU Activation (2402.01169v1)

Published 2 Feb 2024 in cs.CV and cs.AI

Abstract: SWIN transformer is a prominent vision transformer model that has state-of-the-art accuracy in image classification tasks. Despite this success, its unique architecture causes slower inference compared with similar deep neural networks. Integer quantization of the model is one of the methods used to improve its inference latency. However, state-of-the-art has not been able to fully quantize the model. In this work, we improve upon the inference latency of the state-of-the-art methods by removing the floating-point operations, which are associated with the GELU activation in Swin Transformer. While previous work proposed to replace the non-integer operations with linear approximation functions, we propose to replace GELU with ReLU activation. The advantage of ReLU over previous methods is its low memory and computation complexity. We use iterative knowledge distillation to compensate for the lost accuracy due to replacing GELU with ReLU. We quantize our GELU-less SWIN transformer and show that on an RTX 4090 NVIDIA GPU we can improve the inference latency of the quantized SWIN transformer by at least $11\%$ while maintaining an accuracy drop of under $0.5\%$ on the ImageNet evaluation dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee.
  2. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
  3. Fukushima, K. 1975. Cognitron: A self-organizing multilayered neural network. Biological cybernetics, 20(3-4): 121–136.
  4. I-BERT: Integer-only BERT Quantization. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 5506–5518. PMLR.
  5. Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer. arXiv preprint arXiv:2210.06707.
  6. I-ViT: integer-only quantization for efficient vision transformer inference. arXiv preprint arXiv:2207.01405.
  7. Q-vit: Fully differentiable quantization for vision transformer. arXiv preprint arXiv:2201.07703.
  8. Towards fully 8-bit integer inference for the transformer model. arXiv preprint arXiv:2009.08034.
  9. Fq-vit: Post-training quantization for fully quantized vision transformer. arXiv preprint arXiv:2111.13824.
  10. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, 10012–10022.
  11. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11976–11986.
  12. Post-training quantization for vision transformer. Advances in Neural Information Processing Systems, 34: 28092–28103.
  13. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. In International Conference on Learning Representations.
  14. Softermax: Hardware/software co-design of an efficient softmax for transformers. In 2021 58th ACM/IEEE Design Automation Conference (DAC), 469–474. IEEE.
  15. VAQF: fully automatic software-hardware co-design framework for low-bit vision transformer. arXiv preprint arXiv:2201.06618.
  16. Attention is all you need. Advances in neural information processing systems, 30.
  17. Towards efficient vision transformer inference: A first study of transformers on mobile devices. In Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications, 1–7.
  18. Ptq4vit: Post-training quantization framework for vision transformers. arXiv preprint arXiv:2111.12293.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Mohammadreza Tayaranian (6 papers)
  2. Seyyed Hasan Mozafari (2 papers)
  3. James J. Clark (32 papers)
  4. Brett Meyer (5 papers)
  5. Warren Gross (7 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets