Low-Precision Mixed-Computation Models for Inference on Edge (2312.02210v1)

Published 3 Dec 2023 in cs.LG and cs.AI

Abstract: This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs 4-bit Posit (Posit4), which has higher precision around zero, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. Additionally, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of a MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and LLMs. The results show that, on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.

References (32)

Citations (1)

View on Semantic Scholar

Summary

The paper presents a mixed-computation framework that fuses 4-bit Posit and fixed-point representations to balance precision and hardware efficiency.
It employs a sensitivity analysis algorithm and custom gradient approximation to guide layer-wise quantization, minimizing accuracy loss.
The approach achieves approximately a 1.5% accuracy boost with only a 0.19% energy overhead, proving effective for edge device deployment.

In the world of machine learning, especially on edge devices, computational efficiency is crucial. Traditional methods for accelerating deep neural network (DNN) inference on such devices often involve quantization—the process of mapping continuous or high bit-width numbers down to low bit-width representations. A paper innovates in this domain by introducing a mixed-computation framework that utilizes the strengths of two distinct numerical systems: low-precision Posit numbers and fixed-point numbers.

The mixed-computation approach presented in the paper hinges on the strategic allocation of 4-bit Posit (Posit4) and 4-bit fixed point (FixP4) representations within a neural network. The method focuses on assigning Posit4 to the weights of layers that are sensitive to quantization errors, wherein higher precision can significantly impact model accuracy. In contrast, FixP4 is used for weights in layers that are less affected by quantization errors—benefiting from the hardware efficiency of the fixed-point system.

Posit numbers carry a notable advantage over traditional fixed-point and floating-point formats due to their dynamic range and precision. Moreover, the Posits exhibit gradual underflow and overflow behaviors, which prevent abrupt shifts to infinity or zero, thus preserving more information compared to other number systems.

To decide which neural network layers are quantized using which number system, the researchers developed a sensitivity analysis algorithm. This algorithm assesses the quantization error and the impact of weights on the network's overall output, thus guiding the allocation of number representations across the network's architecture.

The paper introduces a custom gradient approximation method for backpropagation, which is particularly suited for Posit quantizer due to its non-uniform nature. This innovation allows weight updates to be more accurate during training. Moreover, hardware implementation of the Posit/FixP computations was also considered, resulting in an efficient design for multiply-accumulate (MAC) operations that is indispensable for DNN tasks.

Evaluation of this mixed-computation method across various vision and LLMs demonstrated a consistent performance improvement over models using fixed-point quantization exclusively. The accuracy gains averaged at about 1.5%, with an energy overhead of just 0.19%. When applied to ubiquitous machine learning models like ResNet, VGG, MobileNet, BERT, and GPT, the method showcased particularly compelling performance advantages.

The paper vividly illustrates that this mixed-computation approach bears little added energy cost considering the MAC unit's consumption in the context of the overall system. The employment of Posit numbers, with their enhanced precision and broad dynamic range, alongside the widely used FixP numbers, indicates a promising path forward for optimizing machine learning models on edge devices. This is especially relevant as demands for on-device AI increase, while privacy concerns and real-time processing needs drive more intelligent computation to the edge.

In summary, the research presents an innovative method that adeptly balances computational efficiency with model accuracy. As machine learning applications continue to pervade everyday devices and necessitate local, real-time processing, such advancements in low-precision computation models could be pivotal in enabling smarter edge-based AI without straining device resources.

PDF Markdown

Low-Precision Mixed-Computation Models for Inference on Edge (2312.02210v1)

Summary

Related Papers

Tweets