Convolutional Neural Networks using Logarithmic Data Representation (1603.01025v2)

Published 3 Mar 2016 in cs.NE and cs.LG

Abstract: Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance. To perform this, we take advantage of the fact that the weights and activations in a trained network naturally have non-uniform distributions. Using non-uniform, base-2 logarithmic representation to encode weights, communicate activations, and perform dot-products enables networks to 1) achieve higher classification accuracies than fixed-point at the same resolution and 2) eliminate bulky digital multipliers. Finally, we propose an end-to-end training procedure that uses log representation at 5-bits, which achieves higher final test accuracy than linear at 5-bits.

Citations (419)

View on Semantic Scholar

Summary

The paper shows that logarithmic representation allows CNNs to operate at 3-bit precision with minimal accuracy loss compared to traditional 8-bit methods.
The approach eliminates the need for digital multipliers, thereby reducing hardware complexity for efficient dot-product computations.
Experimental validation on datasets like CIFAR-10 and ILSVRC-2012 confirms that logarithmic backpropagation maintains robust training and significantly reduces model sizes.

Analyzing Convolutional Neural Networks Using Logarithmic Data Representation

This paper presents a novel approach to data representation in convolutional neural networks (CNNs) by utilizing logarithmic transformation for both activations and weights. The authors, Daisuke Miyashita, Edward H. Lee, and Boris Murmann, contend that using a base-2 logarithmic representation allows for significant reductions in data precision requirements without compromising classification performance. Specifically, this method enables encoding data at a 3-bit resolution while maintaining negligible loss of accuracy, which is a marked improvement over traditional fixed-point representation where reducing precision below 8 bits often leads to noticeable performance degradation.

Key Contributions and Findings

Enhanced Precision with Logarithmic Representation: The paper demonstrates that logarithmic data representation can sustain higher classification accuracies than equivalent resolutions using linear quantization. Notably, networks tested with a 3-bit logarithmic representation showed minimal accuracy loss, whereas similar precision levels in fixed-point formats led to significant performance drops.
Reduction of Computational Complexity: By employing logarithmic data encoding, the need for digital multipliers is obviated, thereby simplifying the hardware requirements for performing dot-product operations in CNNs. This change not only reduces hardware complexity but also aligns with the non-uniform distribution characteristics of weights and activations within trained networks.
Implications of Non-uniform Weight Distributions: Training processes that incorporate weight decay result in weights that are naturally distributed around zero. Similarly, activations tend to be non-uniformly distributed, often in proximity to zero. The proposed logarithmic method capitalizes on these distributions to more efficiently compress the data.
Efficacy of Logarithmic Backpropagation: The authors propose a method for training in the logarithmic domain, including a technique for logarithmic backpropagation, which remains robust with quantized gradients. This approach allows for effective end-to-end training at a 5-bit resolution, matching or surpassing the performance of linear or more complex networks (like BinaryNet), which utilize unquantized gradients.
Experimental Validation: Experiments conducted on standard datasets such as CIFAR-10 and ILSVRC-2012 validate the proposed method. Using architectures like AlexNet and VGG16, the logarithmic approach achieved notable accuracy with significantly reduced model sizes. For instance, AlexNet's model size was reduced from 1.9 GB to 0.27 GB using 4-bit logarithmic quantization for weights.

Implications and Future Research

The insights derived from this work have substantial implications for the design of CNNs, particularly in resource-constrained environments such as embedded systems and mobile devices. The use of logarithmic representation not only conserves memory and reduces computational demands but also enhances the feasibility of deploying deep learning models in practical applications where energy efficiency and processing speed are critical.

Theoretically, this paper pushes the boundaries of how numerical representation can be leveraged to optimize the performance of neural networks. Future research could explore broader applications of logarithmic data representation, potentially extending to other types of neural architectures beyond CNNs. Additionally, investigating the trade-offs between computational efficiency, model interpretability, and accuracy in various data settings could further advance this field.

In summary, this paper provides a compelling case for adopting logarithmic data representation in CNNs, offering tangible benefits for both model performance and hardware efficiency. It paves the way for further exploration into alternative numeric representations in machine learning, promising to broaden the horizons of neural network applications in areas previously thought impractical due to hardware limitations.

PDF Markdown

Convolutional Neural Networks using Logarithmic Data Representation (1603.01025v2)

Summary

Analyzing Convolutional Neural Networks Using Logarithmic Data Representation

Key Contributions and Findings

Implications and Future Research

Related Papers