Papers
Topics
Authors
Recent
Search
2000 character limit reached

Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs

Published 31 Jan 2024 in cs.LG and cs.CV | (2401.17544v1)

Abstract: Quantization is a crucial technique for deploying deep learning models on resource-constrained devices, such as embedded FPGAs. Prior efforts mostly focus on quantizing matrix multiplications, leaving other layers like BatchNorm or shortcuts in floating-point form, even though fixed-point arithmetic is more efficient on FPGAs. A common practice is to fine-tune a pre-trained model to fixed-point for FPGA deployment, but potentially degrading accuracy. This work presents QFX, a novel trainable fixed-point quantization approach that automatically learns the binary-point position during model training. Additionally, we introduce a multiplier-free quantization strategy within QFX to minimize DSP usage. QFX is implemented as a PyTorch-based library that efficiently emulates fixed-point arithmetic, supported by FPGA HLS, in a differentiable manner during backpropagation. With minimal effort, models trained with QFX can readily be deployed through HLS, producing the same numerical results as their software counterparts. Our evaluation shows that compared to post-training quantization, QFX can quantize models trained with element-wise layers quantized to fewer bits and achieve higher accuracy on both CIFAR-10 and ImageNet datasets. We further demonstrate the efficacy of multiplier-free quantization using a state-of-the-art binarized neural network accelerator designed for an embedded FPGA (AMD Xilinx Ultra96 v2). We plan to release QFX in open-source format.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Tensorflow: a system for large-scale machine learning.. In Osdi.
  2. Pareto-optimal quantized resnet is mostly 4-bit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3091–3099.
  3. Scalable methods for 8-bit training of neural networks. Advances in neural information processing systems (2018).
  4. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
  5. Lsq+: Improving low-bit quantization through learnable offsets and better initialization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 696–697.
  6. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition.
  7. QNNPACK: Open source library for optimized mobile deep learning.
  8. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4852–4861.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  10. Jordan L Holt and J-N Hwang. 1993. Finite precision error analysis of neural network hardware implementations. IEEE Trans. Comput. (1993).
  11. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2704–2713.
  12. Ten lessons from three generations shaped google’s tpuv4i: Industrial product. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
  13. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4350–4359.
  14. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  15. Learning multiple layers of features from tiny images. (2009).
  16. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In Proceedings of the European conference on computer vision (ECCV). 722–737.
  17. Tensorquant: A simulation toolbox for deep neural network quantization. In Proceedings of the Machine Learning on HPC Environments.
  18. Szymon Migacz. 2017. NVIDIA 8-bit inference width TensorRT. In GPU Technology Conference.
  19. WRPN: Wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017).
  20. Alessandro Pappalardo. 2023. Xilinx/brevitas. https://doi.org/10.5281/zenodo.3333552
  21. Bibench: Benchmarking and analyzing network binarization. arXiv preprint arXiv:2301.11233 (2023).
  22. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
  23. FxP-QNet: a post-training quantizer for the design of mixed low-precision DNNs with dynamic fixed-point representation. IEEE Access 10 (2022), 30202–30231.
  24. Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105–6114.
  25. AMD XILINX. 2022. Vitis High-Level Synthesis User Guide-UG1399 (v2022. 1).
  26. Learning frequency domain approximation for binary neural networks. Advances in Neural Information Processing Systems (2021).
  27. Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded fpgas. In Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays. 23–32.
  28. Hawq-v3: Dyadic neural network quantization. In International Conference on Machine Learning. PMLR, 11875–11886.
  29. Chunyu Yuan and Sos S Agaian. 2023. A comprehensive review of binary neural network. Artificial Intelligence Review (2023), 1–65.
  30. QPyTorch: A low-precision arithmetic simulation framework. In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS). IEEE, 10–13.
  31. Binarized Neural Machine Translation. arXiv preprint arXiv:2302.04907 (2023).
  32. FracBNN: Accurate and FPGA-efficient binary neural networks with fractional activations. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 171–182.
  33. PokeBNN: A Binary Pursuit of Lightweight Accuracy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12475–12485.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.