Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems (2303.12557v3)

Published 22 Mar 2023 in cs.CV and cs.AI

Abstract: Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, post-training quantization has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this paper, we discover that applying existing post-training quantization (PTQ) methods for ViTs to efficient hybrid transformers leads to a drastic accuracy drop, attributed to the four following challenges: (i) highly dynamic ranges, (ii) zero-point overflow, (iii) diverse normalization, and (iv) limited model parameters ($<$5M). To overcome these challenges, we propose a new post-training quantization method, which is the first to quantize efficient hybrid ViTs (MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, EfficientFormerV2). We achieve a significant improvement of 17.73% for 8-bit and 29.75% for 6-bit on average, respectively, compared with existing PTQ methods (EasyQuant, FQ-ViT, PTQ4ViT, and RepQ-ViT)}. We plan to release our code at https://gitlab.com/ones-ai/q-hyvit.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. S. Chen, L. Li, G. Wang, M. Pang, and C. Shen, “Federated learning with heterogeneous quantization bit allocation and aggregation for internet of things,” IEEE Internet of Things Journal, 2023.
  2. Y. Ji and L. Chen, “Fedqnn: A computation–communication-efficient federated learning framework for iot with low-bitwidth neural network quantization,” IEEE Internet of Things Journal, vol. 10, no. 3, pp. 2494–2507, 2022.
  3. Y. Liu, P. Huang, F. Yang, K. Huang, and L. Shu, “Quasyncfl: Asynchronous federated learning with quantization for cloud-edge-terminal collaboration enabled aiot,” IEEE Internet of Things Journal, 2023.
  4. K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., “A survey on vision transformer,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 87–110, 2022.
  5. S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1–41, 2022.
  6. S. Mehta and M. Rastegari, “Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer,” arXiv preprint arXiv:2110.02178, 2021.
  7. ——, “Separable self-attention for mobile vision transformers,” Transactions on Machine Learning Research, 2022.
  8. R. Krishnamoorthi, “Quantizing deep convolutional networks for efficient inference: A whitepaper,” arXiv preprint arXiv:1806.08342, 2018.
  9. S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, “Learned step size quantization,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.   OpenReview.net, 2020, pp. 1–12.
  10. J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan, “Pact: Parameterized clipping activation for quantized neural networks,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.   OpenReview.net, 2018. [Online]. Available: https://openreview.net/forum?id=ryQu7f-RZ
  11. D. Zhang, J. Yang, D. Ye, and G. Hua, “Lq-nets: Learned quantization for highly accurate and compact deep neural networks,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 365–382.
  12. S. Jung, C. Son, S. Lee, J. Son, J.-J. Han, Y. Kwak, S. J. Hwang, and C. Choi, “Learning to quantize deep networks by optimizing quantization intervals with task loss,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4350–4359.
  13. S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint arXiv:1606.06160, 2016.
  14. B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2704–2713.
  15. S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016. [Online]. Available: http://arxiv.org/abs/1510.00149
  16. Z. Jiang, A. Jain, A. Liu, J. Fromm, C. Ma, T. Chen, and L. Ceze, “Automated backend-aware post-training quantization,” arXiv preprint arXiv:2103.14949, 2021.
  17. R. Banner, Y. Nahshan, and D. Soudry, “Post training 4-bit quantization of convolutional networks for rapid-deployment,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32.   Curran Associates, Inc., 2019.
  18. Y. Choukroun, E. Kravchik, F. Yang, and P. Kisilev, “Low-bit quantization of neural networks for efficient inference,” in ICCV Workshops, 2019, pp. 3009–3018.
  19. R. Zhao, Y. Hu, J. Dotzel, C. De Sa, and Z. Zhang, “Improving neural network quantization without retraining using outlier channel splitting,” in International conference on machine learning.   PMLR, 2019, pp. 7543–7552.
  20. J. H. Lee, S. Ha, S. Choi, W.-J. Lee, and S. Lee, “Quantization for rapid deployment of deep neural networks,” arXiv preprint arXiv:1810.05488, 2018.
  21. A. Goncharenko, A. Denisov, S. Alyamkin, and E. Terentev, “Fast adjustable threshold for uniform neural network quantization,” International Journal of Computer and Information Engineering, vol. 13, no. 9, pp. 495–499, 2019.
  22. S. Migacz, “8-bit inference with tensorrt,” in GPU technology conference, vol. 2, no. 4, 2017, p. 5.
  23. H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, “Integer quantization for deep learning inference: Principles and empirical evaluation,” arXiv preprint arXiv:2004.09602, 2020.
  24. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  25. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning.   PMLR, 2021, pp. 10 347–10 357.
  26. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022.
  27. D. Wu, Q. Tang, Y. Zhao, M. Zhang, Y. Fu, and D. Zhang, “Easyquant: Post-training quantization via scale optimization,” arXiv preprint arXiv:2006.16669, 2020.
  28. Y. Lin, T. Zhang, P. Sun, Z. Li, and S. Zhou, “Fq-vit: Post-training quantization for fully quantized vision transformer,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, 2022, pp. 1173–1179.
  29. Z. Yuan, C. Xue, Y. Chen, Q. Wu, and G. Sun, “Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization,” in European Conference on Computer Vision.   Springer, 2022, pp. 191–207.
  30. Z. Li, J. Xiao, L. Yang, and Q. Gu, “Repq-vit: Scale reparameterization for post-training quantization of vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 227–17 236.
  31. Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan, and Z. Liu, “Mobile-former: Bridging mobilenet and transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5270–5279.
  32. Y. Li, G. Yuan, Y. Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y. Wang, and J. Ren, “Efficientformer: Vision transformers at mobilenet speed,” Advances in Neural Information Processing Systems, vol. 35, pp. 12 934–12 949, 2022.
  33. Y. Li, J. Hu, Y. Wen, G. Evangelidis, K. Salahi, Y. Wang, S. Tulyakov, and J. Ren, “Rethinking vision transformers for mobilenet size and speed,” in Proceedings of the IEEE international conference on computer vision, 2023.
  34. F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
  35. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  36. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
  37. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1314–1324.
  38. N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116–131.
  39. M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2820–2828.
  40. M. Nagel, M. v. Baalen, T. Blankevoort, and M. Welling, “Data-free quantization through weight equalization and bias correction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1325–1334.
  41. E. Meller, A. Finkelstein, U. Almog, and M. Grobman, “Same, same but different: Recovering neural network quantization error through weight factorization,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97.   PMLR, 09–15 Jun 2019, pp. 4486–4495.
  42. M. Nagel, R. A. Amjad, M. Van Baalen, C. Louizos, and T. Blankevoort, “Up or down? adaptive rounding for post-training quantization,” in International Conference on Machine Learning.   PMLR, 2020, pp. 7197–7206.
  43. Y. Li, R. Gong, X. Tan, Y. Yang, P. Hu, Q. Zhang, F. Yu, W. Wang, and S. Gu, “Brecq: Pushing the limit of post-training quantization by block reconstruction,” arXiv preprint arXiv:2102.05426, 2021.
  44. I. Hubara, Y. Nahshan, Y. Hanani, R. Banner, and D. Soudry, “Accurate post training quantization with small calibration sets,” in International Conference on Machine Learning.   PMLR, 2021, pp. 4466–4475.
  45. X. Wei, R. Gong, Y. Li, X. Liu, and F. Yu, “Qdrop: randomly dropping quantization for extremely low-bit post-training quantization,” arXiv preprint arXiv:2203.05740, 2022.
  46. C. Wang, D. Zheng, Y. Liu, and L. Li, “Leveraging inter-layer dependency for post-training quantization,” in Advances in Neural Information Processing Systems.
  47. Z. Liu, Y. Wang, K. Han, W. Zhang, S. Ma, and W. Gao, “Post-training quantization for vision transformer,” Advances in Neural Information Processing Systems, vol. 34, pp. 28 092–28 103, 2021.
  48. Y. Ding, H. Qin, Q. Yan, Z. Chai, J. Liu, X. Wei, and X. Liu, “Towards accurate post-training quantization for vision transformer,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 5380–5388.
  49. Y. Liu, H. Yang, Z. Dong, K. Keutzer, L. Du, and S. Zhang, “Noisyquant: Noisy bias-enhanced post-training activation quantization for vision transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20 321–20 330.
  50. R. Li, Y. Wang, F. Liang, H. Qin, J. Yan, and R. Fan, “Fully quantized network for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2810–2819.
  51. D. Yang and Z. Luo, “A parallel processing cnn accelerator on embedded devices based on optimized mobilenet,” IEEE Internet of Things Journal, 2023.
  52. E. Russo, M. Palesi, S. Monteleone, D. Patti, A. Mineo, G. Ascia, and V. Catania, “Dnn model compression for iot domain-specific hardware accelerators,” IEEE Internet of Things Journal, vol. 9, no. 9, pp. 6650–6662, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jemin Lee (45 papers)
  2. Yongin Kwon (10 papers)
  3. Sihyeong Park (6 papers)
  4. Misun Yu (2 papers)
  5. Jeman Park (8 papers)
  6. Hwanjun Song (44 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets