Scalable Model Compression by Entropy Penalized Reparameterization (1906.06624v3)

Published 15 Jun 2019 in cs.LG, cs.CV, and stat.ML

Abstract: We describe a simple and general neural network weight compression approach, in which the network parameters (weights and biases) are represented in a "latent" space, amounting to a reparameterization. This space is equipped with a learned probability model, which is used to impose an entropy penalty on the parameter representation during training, and to compress the representation using a simple arithmetic coder after training. Classification accuracy and model compressibility is maximized jointly, with the bitrate--accuracy trade-off specified by a hyperparameter. We evaluate the method on the MNIST, CIFAR-10 and ImageNet classification benchmarks using six distinct model architectures. Our results show that state-of-the-art model compression can be achieved in a scalable and general way without requiring complex procedures such as multi-stage training.

Citations (41)

View on Semantic Scholar

Summary

The paper proposes a novel model compression technique using entropy penalized reparameterization to balance bitrate and accuracy.
The method reparameterizes network parameters into a latent space and applies scalar quantization with a continuous surrogate for robust training.
Experimental results on datasets like ImageNet and CIFAR-10 show significant compression factors while retaining state-of-the-art accuracy.

Scalable Model Compression by Entropy Penalized Reparameterization

The paper "Scalable Model Compression by Entropy Penalized Reparameterization" addresses a significant challenge in deep learning: the storage and deployment of artificial neural networks (ANNs). While ANNs have been shown to perform exceptionally well across various tasks, their deployment can often be hindered by the substantial space their parameters require, which can complicate storage or transmission, especially for resource-constrained environments.

Approach and Methodology

The authors propose a novel model compression technique that centers on reparameterization of network parameters into a 'latent' space equipped with a learned probability model. This method employs an entropy penalty to regularize the parameter space during training, ultimately enabling the compressed representation of the model. The compression occurs post-training using an arithmetic coder. Crucially, this approach optimizes the trade-off between bitrate and accuracy through a hyperparameter, allowing the method to maintain state-of-the-art performance while simplifying the process compared to more complex, multi-phase strategies.

This model compression is primarily realized by the following key steps:

Reparameterization: Instead of direct manipulation of ANN parameters, the parameters are mapped to a latent space transforming the problem into one where scalar quantization (SQ) is applied in a transformed space.
Entropy Penalization: A cost function incorporating classification loss and a penalty term corresponding to the entropy of the model parameters in their reparameterized form is minimized. This ensures compressibility as well as accuracy.
Optimization: Despite the discrete nature of reparameterized parameters, the method employs stochastic gradient descent by using a continuous surrogate variable, supported by techniques such as the straight-through estimator, for better handling of quantization.

Experimental Results

The authors evaluated their method across several benchmark datasets including MNIST, CIFAR-10, and ImageNet using six different model architectures (e.g., ResNet, VGG). The results demonstrate that the proposed method achieves competitive model sizes and classification accuracy compared to the state-of-the-art. The method particularly shines with larger models by maintaining simplicity and breaking away from the typical complex, staged compression strategies.

For ResNet on ImageNet, the method provided a compression factor up to 24x while retaining competitive accuracy.
On CIFAR-10 with VGG-16, it yielded a model compression of 590x with only a minor increase in error, demonstrating robust performance across architectures.

Implications and Future Directions

This paper lays a foundation for practical model deployment scenarios where model size directly impacts efficiency without substantially sacrificing accuracy. By simplifying the process into a single-stage training, the method can benefit on-device deep learning applications, particularly for mobile and embedded systems.

Future research can explore the extension of this approach through more sophisticated parameter decoders to further improve the flexibility and effectiveness of model compression. Moreover, examining the integration of pruning to further optimize resource constraints could provide a comprehensive solution for neural network deployment.

Overall, this paper contributes a significant step towards efficient and scalable model compression by utilizing and enhancing classical compression principles with modern neural network methodologies, paving the way for practical advancements in the deployment of deep learning models.

PDF Markdown

Related Papers

Tweets

https://twitter.com/doesnotexist/status/1762186832006013231