Fixed Point Quantization of Deep Convolutional Networks (1511.06393v3)

Published 19 Nov 2015 in cs.LG

Abstract: In recent years increasingly complex architectures for deep convolution networks (DCNs) have been proposed to boost the performance on image recognition tasks. However, the gains in performance have come at a cost of substantial increase in computation and model storage resources. Fixed point implementation of DCNs has the potential to alleviate some of these complexities and facilitate potential deployment on embedded hardware. In this paper, we propose a quantizer design for fixed point implementation of DCNs. We formulate and solve an optimization problem to identify optimal fixed point bit-width allocation across DCN layers. Our experiments show that in comparison to equal bit-width settings, the fixed point DCNs with optimized bit width allocation offer >20% reduction in the model size without any loss in accuracy on CIFAR-10 benchmark. We also demonstrate that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. In doing so, we report a new state-of-the-art fixed point performance of 6.78% error-rate on CIFAR-10 benchmark.

Citations (790)

View on Semantic Scholar

Summary

The paper proposes a method for converting floating-point DCNs to fixed-point, optimizing storage, memory, and power efficiency.
It formulates an optimization problem for bit-width allocation, achieving over 20% model size reduction on CIFAR-10 without accuracy loss.
The study combines theoretical SQNR analysis with empirical validation, paving the way for efficient deep learning deployment on embedded hardware.

Fixed Point Quantization of Deep Convolutional Networks

The paper "Fixed Point Quantization of Deep Convolutional Networks" by Darryl D. Lin, Sachin S. Talathi, and V. Sreekanth Annapureddy proposes a detailed approach to the fixed point implementation of Deep Convolutional Networks (DCNs), targeting their deployment on embedded hardware systems. This research addresses the challenge posed by the increased computational complexity and substantial resource requirements associated with modern DCNs, particularly when utilized for tasks like image recognition. The goal is to provide an efficient methodology for converting floating point DCNs into fixed point equivalents, with the aim of optimizing model storage, memory bandwidth, and power consumption without compromising accuracy.

Introduction and Motivation

DCNs have demonstrated remarkable performance improvements in image and speech recognition tasks, and they have been integral to advancements in tasks such as object detection and image retrieval. However, these improvements come at the cost of significantly increased computational and storage demands. Deploying such resource-intensive DCNs on hardware with limited computational power, such as mobile devices or embedded systems, necessitates efficient methods to reduce complexity. Fixed point implementation presents a viable solution due to its advantages in reducing computation time, memory usage, and overall power consumption.

Quantizer Design and Optimization

The authors introduce a quantizer design for the fixed point conversion of DCNs. They focus on formulating an optimization problem to identify the optimal bit-width allocation across different layers of the DCN. This approach acknowledges that different layers of a DCN may require different bit-widths to minimize errors induced by quantization while achieving significant reductions in model size.

By experimenting with the CIFAR-10 benchmark, the authors demonstrate that their optimized fixed point DCNs can achieve over 20% reduction in model size without any loss in accuracy, as compared to models with equal bit-width settings. Furthermore, a new state-of-the-art fixed point performance is reported with an error rate of 6.78% on CIFAR-10 when fine-tuning is applied after conversion.

Theoretical Insights and Cross-Layer Bit-Width Optimization

The paper provides a thorough analysis of the relationship between quantization noise and classification accuracy, demonstrating that all quantization steps contribute equally to the overall Signal-to-Quantization-Noise-Ratio (SQNR) of the output. Consequently, layers with a larger number of parameters benefit from relatively lower bit-widths under the optimized strategy, which is both theoretically sound and empirically validated.

The optimization problem is framed to minimize overall model size while maintaining a prescribed minimum SQNR, leveraging a water-filling solution to achieve this. This optimized bit-width allocation strategy is validated with comprehensive experiments on CIFAR-10 and an AlexNet-like network trained on ImageNet-1000. Whereas the optimized strategy on CIFAR-10 results in a notable 20% reduction in model size with sustained accuracy, the impact is less pronounced on the AlexNet-like network due to the dominance of fully-connected layers in terms of parameter count.

Practical Implications and Future Directions

This research underscores the practical significance of fixed point DCNs in resource-constrained environments. It offers an analytical method for optimizing bit-width allocation, making it feasible to deploy high-performing DCNs on less powerful hardware. The insights derived from SQNR analysis and quantization noise contribute to the theoretical foundation of deep learning quantization.

Future developments in this domain may explore combining the presented quantizer design with additional methods for reducing model complexity, such as model pruning and compression. Additionally, the empirical determination of quantization efficiency under various input distributions beyond Gaussian assumptions could further refine the optimization strategies.

Conclusion

In conclusion, this paper presents a structured and principled approach to fixed point quantization of DCNs, achieving substantial model size reductions while maintaining accuracy. This is particularly valuable for the deployment of sophisticated DCNs in real-world applications on embedded hardware. The findings and methodologies proposed not only enhance the understanding of quantization effects but also offer practical solutions for optimizing deep learning models for efficient hardware implementation. This work paves the way for further optimization and fine-tuning strategies tailored to the specific constraints and requirements of various deep learning applications.