Quantized Sparse Weight Decomposition for Neural Network Compression (2207.11048v1)

Published 22 Jul 2022 in cs.LG

Abstract: In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to find quantized and sparse factorization of the weight tensors. We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA. Combined with end-to-end fine-tuning our method exceeds or is on par with previous state-of-the-art methods in terms of the trade-off between accuracy and model size. Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.

Citations (3)

View on Semantic Scholar

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Quantized Sparse Weight Decomposition for Neural Network Compression (2207.11048v1)

Collections

Summary

Follow-up Questions

Authors (4)

Don't miss out on important new AI/ML research

Quantized Sparse Weight Decomposition for Neural Network Compression (2207.11048v1)

Collections

Summary

Follow-up Questions

Related Papers

Authors (4)

Don't miss out on important new AI/ML research