Low-Precision Mixed-Computation Models for Inference on Edge (2312.02210v1)
Abstract: This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs 4-bit Posit (Posit4), which has higher precision around zero, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. Additionally, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of a MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and LLMs. The results show that, on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012.
- K. He et al., “Deep residual learning for image recognition,” in Proc. of CVPR, 2016.
- A. Radford et al., “Language models are unsupervised multitask learners,” OpenAI blog, 2019.
- J. Devlin et al., “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proc. of Conference of the North American Chapter of the Association for Computational Linguistics, 2019.
- B. Noune et al., “8-bit numerical formats for deep neural networks,” arXiv preprint arXiv:2206.02915, 2022.
- H. Yu et al., “Any-precision deep neural networks,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI, 2021.
- N. Wang et al., “Training deep neural networks with 8-bit floating point numbers,” Advances in neural information processing systems, 2018.
- J. Choi et al., “PACT: parameterized clipping activation for quantized neural networks,” CoRR, vol. abs/1805.06085, 2018.
- J. L. Gustafson and I. T. Yonemoto, “Beating floating point at its own game: Posit arithmetic,” Supercomput. Front. Innov., 2017.
- S. Nambi et al., “Expan(n)d: Exploring posits for efficient artificial neural network design in fpga-based systems,” IEEE Access, 2021.
- J. Lu et al., “Evaluations on deep neural networks training using posit number system,” IEEE Transactions on Computers, 2020.
- Y. Nakahara et al., “A posit based multiply-accumulate unit with small quire size for deep neural networks,” IPSJ Trans. Syst. LSI Des. Methodol., vol. 15, 2022.
- R. Murillo et al., “PLAM: A posit logarithm-approximate multiplier,” IEEE Trans. Emerg. Top. Comput., pp. 2079–2085, 2022.
- “Ieee standard for binary floating-point arithmetic,” ANSI/IEEE Std 754-1985, pp. 1–20, 1985.
- “Ieee standard for floating-point arithmetic,” IEEE Std 754-2019 (Revision of IEEE 754-2008), pp. 1–84, 2019.
- J. Gustafson, “Posit arithmetic,” Mathematica Notebook describing the posit number system, vol. 30, 2017.
- J. Lu et al., “Training deep neural networks using posit number system,” in 32nd IEEE International System-on-Chip Conference 2019,.
- P. Micikevicius et al., “Fp8 formats for deep learning,” arXiv preprint arXiv:2209.05433, 2022.
- Z. Dong et al., “Hawq: Hessian aware quantization of neural networks with mixed-precision,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
- R. Banner et al., “Scalable methods for 8-bit training of neural networks,” Advances in neural information processing systems, 2018.
- B. Noune et al., “8-bit numerical formats for deep neural networks,” CoRR, vol. abs/2206.02915, 2022.
- Z. Carmichael et al., “Performance-efficiency trade-off of low-precision numerical formats in deep neural networks,” in Proc. of the conf. for next generation arithmetic, 2019.
- H. F. Langroudi et al., “Cheetah: Mixed low-precision hardware & software co-design framework for dnns on the edge,” CoRR, 2019.
- Z. Carmichael et al., “Proc. of date,” 2019.
- R. Murillo, A. A. Del Barrio, and G. Botella, “Deep pensieve: A deep learning framework based on the posit number system,” Digital Signal Processing, 2020.
- D. Xie, J. Xiong, and S. Pu, “All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition.
- S. Azizi et al., “Sensitivity-aware mixed-precision quantization and width optimization of deep neural networks through cluster-based tree-structured parzen estimation,” CoRR, 2023.
- Z. Liu et al., “Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation,” in IEEE Conference on Computer Vision and Pattern Recognition,, 2022.
- Y. Bengio, N. Léonard, and A. C. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” CoRR, 2013.
- B. Reagen et al., “Minerva: Enabling low-power, highly-accurate deep neural network accelerators,” in Proc. of ISCA, 2016.
- Y.-H. Chen et al., “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE JESTCS, 2019.
- I. O. Tolstikhin et al., “Mlp-mixer: An all-mlp architecture for vision,” Advances in neural information processing systems, 2021.