Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-Precision Mixed-Computation Models for Inference on Edge (2312.02210v1)

Published 3 Dec 2023 in cs.LG and cs.AI

Abstract: This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs 4-bit Posit (Posit4), which has higher precision around zero, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. Additionally, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of a MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and LLMs. The results show that, on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012.
  2. K. He et al., “Deep residual learning for image recognition,” in Proc. of CVPR, 2016.
  3. A. Radford et al., “Language models are unsupervised multitask learners,” OpenAI blog, 2019.
  4. J. Devlin et al., “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proc. of Conference of the North American Chapter of the Association for Computational Linguistics, 2019.
  5. B. Noune et al., “8-bit numerical formats for deep neural networks,” arXiv preprint arXiv:2206.02915, 2022.
  6. H. Yu et al., “Any-precision deep neural networks,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI, 2021.
  7. N. Wang et al., “Training deep neural networks with 8-bit floating point numbers,” Advances in neural information processing systems, 2018.
  8. J. Choi et al., “PACT: parameterized clipping activation for quantized neural networks,” CoRR, vol. abs/1805.06085, 2018.
  9. J. L. Gustafson and I. T. Yonemoto, “Beating floating point at its own game: Posit arithmetic,” Supercomput. Front. Innov., 2017.
  10. S. Nambi et al., “Expan(n)d: Exploring posits for efficient artificial neural network design in fpga-based systems,” IEEE Access, 2021.
  11. J. Lu et al., “Evaluations on deep neural networks training using posit number system,” IEEE Transactions on Computers, 2020.
  12. Y. Nakahara et al., “A posit based multiply-accumulate unit with small quire size for deep neural networks,” IPSJ Trans. Syst. LSI Des. Methodol., vol. 15, 2022.
  13. R. Murillo et al., “PLAM: A posit logarithm-approximate multiplier,” IEEE Trans. Emerg. Top. Comput., pp. 2079–2085, 2022.
  14. “Ieee standard for binary floating-point arithmetic,” ANSI/IEEE Std 754-1985, pp. 1–20, 1985.
  15. “Ieee standard for floating-point arithmetic,” IEEE Std 754-2019 (Revision of IEEE 754-2008), pp. 1–84, 2019.
  16. J. Gustafson, “Posit arithmetic,” Mathematica Notebook describing the posit number system, vol. 30, 2017.
  17. J. Lu et al., “Training deep neural networks using posit number system,” in 32nd IEEE International System-on-Chip Conference 2019,.
  18. P. Micikevicius et al., “Fp8 formats for deep learning,” arXiv preprint arXiv:2209.05433, 2022.
  19. Z. Dong et al., “Hawq: Hessian aware quantization of neural networks with mixed-precision,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
  20. R. Banner et al., “Scalable methods for 8-bit training of neural networks,” Advances in neural information processing systems, 2018.
  21. B. Noune et al., “8-bit numerical formats for deep neural networks,” CoRR, vol. abs/2206.02915, 2022.
  22. Z. Carmichael et al., “Performance-efficiency trade-off of low-precision numerical formats in deep neural networks,” in Proc. of the conf. for next generation arithmetic, 2019.
  23. H. F. Langroudi et al., “Cheetah: Mixed low-precision hardware & software co-design framework for dnns on the edge,” CoRR, 2019.
  24. Z. Carmichael et al., “Proc. of date,” 2019.
  25. R. Murillo, A. A. Del Barrio, and G. Botella, “Deep pensieve: A deep learning framework based on the posit number system,” Digital Signal Processing, 2020.
  26. D. Xie, J. Xiong, and S. Pu, “All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition.
  27. S. Azizi et al., “Sensitivity-aware mixed-precision quantization and width optimization of deep neural networks through cluster-based tree-structured parzen estimation,” CoRR, 2023.
  28. Z. Liu et al., “Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation,” in IEEE Conference on Computer Vision and Pattern Recognition,, 2022.
  29. Y. Bengio, N. Léonard, and A. C. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” CoRR, 2013.
  30. B. Reagen et al., “Minerva: Enabling low-power, highly-accurate deep neural network accelerators,” in Proc. of ISCA, 2016.
  31. Y.-H. Chen et al., “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE JESTCS, 2019.
  32. I. O. Tolstikhin et al., “Mlp-mixer: An all-mlp architecture for vision,” Advances in neural information processing systems, 2021.
Citations (1)

Summary

  • The paper presents a mixed-computation framework that fuses 4-bit Posit and fixed-point representations to balance precision and hardware efficiency.
  • It employs a sensitivity analysis algorithm and custom gradient approximation to guide layer-wise quantization, minimizing accuracy loss.
  • The approach achieves approximately a 1.5% accuracy boost with only a 0.19% energy overhead, proving effective for edge device deployment.

In the world of machine learning, especially on edge devices, computational efficiency is crucial. Traditional methods for accelerating deep neural network (DNN) inference on such devices often involve quantization—the process of mapping continuous or high bit-width numbers down to low bit-width representations. A paper innovates in this domain by introducing a mixed-computation framework that utilizes the strengths of two distinct numerical systems: low-precision Posit numbers and fixed-point numbers.

The mixed-computation approach presented in the paper hinges on the strategic allocation of 4-bit Posit (Posit4) and 4-bit fixed point (FixP4) representations within a neural network. The method focuses on assigning Posit4 to the weights of layers that are sensitive to quantization errors, wherein higher precision can significantly impact model accuracy. In contrast, FixP4 is used for weights in layers that are less affected by quantization errors—benefiting from the hardware efficiency of the fixed-point system.

Posit numbers carry a notable advantage over traditional fixed-point and floating-point formats due to their dynamic range and precision. Moreover, the Posits exhibit gradual underflow and overflow behaviors, which prevent abrupt shifts to infinity or zero, thus preserving more information compared to other number systems.

To decide which neural network layers are quantized using which number system, the researchers developed a sensitivity analysis algorithm. This algorithm assesses the quantization error and the impact of weights on the network's overall output, thus guiding the allocation of number representations across the network's architecture.

The paper introduces a custom gradient approximation method for backpropagation, which is particularly suited for Posit quantizer due to its non-uniform nature. This innovation allows weight updates to be more accurate during training. Moreover, hardware implementation of the Posit/FixP computations was also considered, resulting in an efficient design for multiply-accumulate (MAC) operations that is indispensable for DNN tasks.

Evaluation of this mixed-computation method across various vision and LLMs demonstrated a consistent performance improvement over models using fixed-point quantization exclusively. The accuracy gains averaged at about 1.5%, with an energy overhead of just 0.19%. When applied to ubiquitous machine learning models like ResNet, VGG, MobileNet, BERT, and GPT, the method showcased particularly compelling performance advantages.

The paper vividly illustrates that this mixed-computation approach bears little added energy cost considering the MAC unit's consumption in the context of the overall system. The employment of Posit numbers, with their enhanced precision and broad dynamic range, alongside the widely used FixP numbers, indicates a promising path forward for optimizing machine learning models on edge devices. This is especially relevant as demands for on-device AI increase, while privacy concerns and real-time processing needs drive more intelligent computation to the edge.

In summary, the research presents an innovative method that adeptly balances computational efficiency with model accuracy. As machine learning applications continue to pervade everyday devices and necessitate local, real-time processing, such advancements in low-precision computation models could be pivotal in enabling smarter edge-based AI without straining device resources.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets