Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeepShift: Towards Multiplication-Less Neural Networks (1905.13298v5)

Published 30 May 2019 in cs.LG and cs.NE

Abstract: The high computation, memory, and power budgets of inferring convolutional neural networks (CNNs) are major bottlenecks of model deployment to edge computing platforms, e.g., mobile devices and IoT. Moreover, training CNNs is time and energy-intensive even on high-grade servers. Convolution layers and fully connected layers, because of their intense use of multiplications, are the dominant contributor to this computation budget. We propose to alleviate this problem by introducing two new operations: convolutional shifts and fully-connected shifts which replace multiplications with bitwise shift and sign flipping during both training and inference. During inference, both approaches require only 5 bits (or less) to represent the weights. This family of neural network architectures (that use convolutional shifts and fully connected shifts) is referred to as DeepShift models. We propose two methods to train DeepShift models: DeepShift-Q which trains regular weights constrained to powers of 2, and DeepShift-PS that trains the values of the shifts and sign flips directly. Very close accuracy, and in some cases higher accuracy, to baselines are achieved. Converting pre-trained 32-bit floating-point baseline models of ResNet18, ResNet50, VGG16, and GoogleNet to DeepShift and training them for 15 to 30 epochs, resulted in Top-1/Top-5 accuracies higher than that of the original model. Last but not least, we implemented the convolutional shifts and fully connected shift GPU kernels and showed a reduction in latency time of 25% when inferring ResNet18 compared to unoptimized multiplication-based GPU kernels. The code can be found at https://github.com/mostafaelhoushi/DeepShift.

Analysis of "DeepShift: Towards Multiplication-Less Neural Networks"

The paper, "DeepShift: Towards Multiplication-Less Neural Networks," addresses a critical performance bottleneck associated with the deployment of convolutional neural networks (CNNs) on resource-constrained edge devices, such as mobile devices and IoT platforms. By focusing on reducing the computational overhead of CNNs, the authors propose a novel approach that substitutes conventional multiplication operations in CNNs with bitwise shifts and sign flips, which inherently require fewer computational resources.

Summary of Contributions

  1. Introduction of DeepShift Models: The paper introduces DeepShift models, which leverage convolutional shifts and fully connected shifts to replace multiplication operations with more computationally efficient bitwise operations. Specifically, these models perform shifts and sign flips, reducing the need for power- and resource-intensive multiplication operations.
  2. Training Paradigms: Two training methodologies—DeepShift-Q and DeepShift-PS—are proposed for training these models. DeepShift-Q maintains regular weights constrained to powers of two, while DeepShift-PS directly trains the shift values and sign variables. This distinction allows for flexibility in training and optimization based on different computational constraints or accuracy requirements.
  3. Empirical Results: The empirical evaluation of DeepShift models demonstrates their efficacy in maintaining comparable accuracy levels to existing state-of-the-art models while significantly optimizing computational efficiency. The paper reports accurate performance on benchmark datasets, including CIFAR10 and ImageNet, with Top-1 and Top-5 accuracies exceeding the original models in some cases. The authors showcase the successful conversion and retraining of well-known architectures, such as ResNet18, ResNet50, VGG16, and GoogleNet, achieving accuracy improvements of 0.1% to 0.95% over the baseline models.
  4. Reduction in Latency and Model Size: A practical contribution of this paper is the reduction in inference delay and memory footprint required by the DeepShift models. With the new operators implemented on GPUs, the inference latency for ResNet18 saw a reduction of 25% compared to traditional models. Moreover, these methods reduced the bit-width needed to represent model weights, further optimizing model size and memory requirements.

Practical and Theoretical Implications

The practical advantages of the DeepShift models are evident in reduced computational requirements, which are particularly advantageous for edge computing applications. The reduction in power consumption aligns well with the trend towards enabling more AI capabilities on-device, rather than relying on cloud-based models, which incur additional latency and privacy risks.

Theoretically, the paper offers an innovative perspective on byte-level optimization in neural networks. By demonstrating the viability of bitwise shift operations, this research invites future exploration into other integer-based operations and further miniaturization of neural network models.

Future Directions

This research paves the way for several future explorations:

  • Hardware Integration: The integration of shift-based neural network operations into custom hardware or FPGAs could potentially yield even better performance gains and efficiency than software-level optimizations alone.
  • Generalization to Other Network Architectures: While demonstrated on CNNs, the principles behind DeepShift could be adapted for other architectures such as transformers or recurrent neural networks, potentially expanding its applicability.
  • Low-Power Training: While the paper focuses on inference efficiency, future investigations could focus on training efficiency for edge devices, potentially enabling real-time language or model updates without relying on centralized server resources.

In summary, the work successfully showcases a reduction in computational overhead for neural networks without substantial trade-offs in accuracy, pushing forward the state-of-the-art in edge computing optimization and opening possibilities for more efficient AI applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Mostafa Elhoushi (22 papers)
  2. Zihao Chen (54 papers)
  3. Farhan Shafiq (4 papers)
  4. Ye Henry Tian (2 papers)
  5. Joey Yiwei Li (2 papers)
Citations (90)