Compressing Deep Convolutional Networks using Vector Quantization (1412.6115v1)

Published 18 Dec 2014 in cs.CV, cs.LG, and cs.NE

Abstract: Deep convolutional neural networks (CNN) has become the most promising method for object recognition, repeatedly demonstrating record breaking results for image classification and object detection in recent years. However, a very deep CNN generally involves many layers with millions of parameters, making the storage of the network model to be extremely large. This prohibits the usage of deep CNNs on resource limited hardware, especially cell phones or other embedded devices. In this paper, we tackle this model storage issue by investigating information theoretical vector quantization methods for compressing the parameters of CNNs. In particular, we have found in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods. Simply applying k-means clustering to the weights or conducting product quantization can lead to a very good balance between model size and recognition accuracy. For the 1000-category classification task in the ImageNet challenge, we are able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN.

PDF Abstract

Compressing Deep Convolutional Networks using Vector Quantization

Overview

The paper "Compressing Deep Convolutional Networks using Vector Quantization" by Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev addresses the challenge of reducing the storage requirements of deep convolutional neural networks (CNNs). This is particularly relevant given the substantial storage needs of CNNs, which often limits their deployment on resource-constrained devices such as cell phones and embedded systems. The authors propose to leverage vector quantization methods to compress the parameters, specifically focusing on the dense connected layers which are notably storage-intensive.

Key Contributions

The paper outlines several significant contributions:

Systematic Exploration of Vector Quantization: Unlike prior works focused on matrix factorization for speeding up inference, this paper systematically explores vector quantization methods for compressing the parameters of dense connected layers specifically aimed at storage reduction.
Comprehensive Evaluation: A thorough evaluation of various vector quantization methods, including k-means, product quantization (PQ), and residual quantization (RQ), is performed. The paper shows that structured quantization like PQ works significantly better than other methods.
Practical Implications and Verification: The authors not only demonstrate the compressive efficacy on image classification tasks but also validate the generalization ability on image retrieval tasks, using the compressed models.

Numerical Results

The authors report robust numerical outcomes, achieving 16-24 times compression of network parameters with only a 1% loss in classification accuracy on the ImageNet challenge. This demonstrates the practical feasibility of deploying heavy CNNs on storage-constrained devices without a significant performance trade-off.

Detailed Methodology

The methodologies explored are outlined in two primary categories:

Matrix Factorization Methods: Singular Value Decomposition (SVD) is used but shows limited efficacy for this specific application because the two matrices resulting from SVD still need considerable storage.
Vector Quantization Methods:
- Binarization: A simplest form of quantization changing the sign of weights, offering a 32x compression.
- Scalar Quantization using k-means: Demonstrated as surprisingly effective, this method clusters weight values and assigns indexes to reduce storage.
- Product Quantization (PQ): The paper finds PQ to be particularly powerful, allowing for finer compression by splitting matrices into sub-spaces and performing k-means clustering in each.
- Residual Quantization (RQ): Though explored, RQ was found less effective for this task primarily due to large codebook sizes and greater complexity.

Implications and Future Directions

The implications of this research are manifold. Practically, it allows for the deployment of high-performance CNNs on embedded devices, thus enabling advanced applications in mobile and edge computing environments where bandwidth and storage are limited.

Theoretically, this paper corroborates findings from earlier research indicating high redundancy and over-parameterization in CNNs, suggesting that most parameters can be predicted or significantly quantized without substantial performance degradation. This insight opens avenues for further research into architectural design principles that inherently utilize fewer parameters.

Future research should focus on optimizing hardware operations to support these compressed models efficiently. Additionally, applying fine-tuning to the models post-compression may lead to even higher accuracy retention. Extending these vector quantization techniques to compress convolutional layers and beyond CNNs could also provide broader benefits across diverse neural network architectures.

Conclusion

This paper presents a compelling discussion on using vector quantization for compressing dense connected layers in CNNs, achieving significant storage reduction while maintaining competitive performance. The insights and methodologies discussed pave the way for more efficient deployment of deep learning models in constrained environments, supporting broader applications and accessibility in the field of artificial intelligence.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Yunchao Gong (6 papers)
Liu Liu (190 papers)
Ming Yang (289 papers)
Lubomir Bourdev (16 papers)

Citations (1,142)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos