Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Training and Inference with Integers in Deep Neural Networks (1802.04680v1)

Published 13 Feb 2018 in cs.LG

Abstract: Researches on deep neural networks with discrete parameters and their deployment in embedded systems have been active and promising topics. Although previous works have successfully reduced precision in inference, transferring both training and inference processes to low-bitwidth integers has not been demonstrated simultaneously. In this work, we develop a new method termed as "WAGE" to discretize both training and inference, where weights (W), activations (A), gradients (G) and errors (E) among layers are shifted and linearly constrained to low-bitwidth integers. To perform pure discrete dataflow for fixed-point devices, we further replace batch normalization by a constant scaling layer and simplify other components that are arduous for integer implementation. Improved accuracies can be obtained on multiple datasets, which indicates that WAGE somehow acts as a type of regularization. Empirically, we demonstrate the potential to deploy training in hardware systems such as integer-based deep learning accelerators and neuromorphic chips with comparable accuracy and higher energy efficiency, which is crucial to future AI applications in variable scenarios with transfer and continual learning demands.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shuang Wu (99 papers)
  2. Guoqi Li (90 papers)
  3. Feng Chen (261 papers)
  4. Luping Shi (21 papers)
Citations (366)

Summary

  • The paper introduces the WAGE framework that quantizes weights, activations, gradients, and errors, enabling both training and inference using low-bit integers.
  • The methodology replaces complex floating-point operations, such as batch normalization, with simpler constant scaling and stochastic rounding techniques.
  • Empirical results on datasets like CIFAR-10 demonstrate that integer-only operations achieve competitive accuracy while enhancing energy efficiency for embedded systems.

Training and Inference with Integers in Deep Neural Networks

The paper "Training and Inference with Integers in Deep Neural Networks" presents a novel methodology termed WAGE for discretizing both the training and inference phases of deep neural networks (DNNs) into low-bitwidth integers. This is a significant step towards realizing the implementation of neural networks in fixed-point hardware, which could lead to enhanced energy efficiency in hardware accelerators such as neuromorphic chips.

Methodological Innovations

The WAGE framework derives its name from its focus on quantizing four distinct operands: Weights (W), Activations (A), Gradients (G), and Errors (E). Previous approaches primarily concentrated on low-precision inference but maintained high precision during training, which is computationally expensive and unsuitable for embedded systems with limited resources. WAGE distinguishes itself by applying quantization to both training and inference processes, transforming them into low-bitwidth operations suitable for integer-based hardware.

Key innovations of the WAGE framework include:

  • Quantization Functions: The framework uses a linear mapping with uniform discretization intervals and introduces stochastic rounding to accommodate real-valued weight updates during the accumulation of gradients.
  • Replacement of Complex Operations: Batch normalization, commonly used to stabilize and enhance training, is replaced with a constant scaling layer due to its complex floating-point calculations that are unsuitable for integer-based computation.
  • Weight Initialization and Scaling: A novel weight initialization and scaling strategy helps in preserving the distribution of weights, ensuring efficient training convergence without batch normalization.

Empirical Evaluation

The WAGE method was evaluated across multiple datasets, including MNIST, CIFAR-10, SVHN, and ImageNet, and showed comparable accuracy to networks that use floating-point operations during training. For instance, on CIFAR-10, WAGE achieved a test error rate of 6.78%, which is competitive with existing low-bitwidth methods but provides the added advantage of integer-only dataflows.

To assess the impact of integer quantization, the paper explored the bitwidth requirements for the gradient and error operands. The investigation revealed that low bitwidths (such as 8 bits) are sufficient for maintaining performance, indicating the potential for energy and storage savings.

Additionally, the paper illustrated that certain quantization-related elements of WAGE, such as the orientation-preserving error quantization strategy, function as regularization techniques, aiding in overfitting reduction.

Implications and Future Directions

The authors highlight the practical implications of this research in terms of energy efficiency and reduced computational resources. By successfully implementing both the training and inference phases in low-bitwidth integers, the WAGE framework paves the way for the deployment of sophisticated AI systems on mobile and embedded devices, facilitating on-device learning and inference with reduced energy consumption.

Future research directions suggested by the authors include optimizing MAC operations for integer arithmetic, exploring non-linear quantization techniques, and refining normalization strategies compatible with integer dataflows. The continued advancement of such frameworks is crucial for enabling real-time, on-device AI applications that are both efficient and effective.

In summary, this paper contributes a comprehensive framework for conducting neural network training and inference entirely with low-bitwidth integer operations. This advancement bears significant potential for enhancing the energy efficiency and portability of AI systems, promising far-reaching impacts on the deployment of deep learning models in diverse, resource-constrained environments.