Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression
Abstract: Compactness in deep learning can be critical to a model's viability in low-resource applications, and a common approach to extreme model compression is quantization. We consider Iterative Product Quantization (iPQ) with Quant-Noise to be state-of-the-art in this area, but this quantization framework suffers from preventable inference quality degradation due to prevalent empty clusters. In this paper, we propose several novel enhancements aiming to improve the accuracy of iPQ with Quant-Noise by focusing on resolving empty clusters. Our contribution, which we call Partitioning-Guided k-means (PG k-means), is a heavily augmented k-means implementation composed of three main components. First, we propose a partitioning-based pre-assignment strategy that ensures no initial empty clusters and encourages an even weight-to-cluster distribution. Second, we propose an empirically superior empty cluster resolution heuristic executed via cautious partitioning of large clusters. Finally, we construct an optional optimization step that consolidates intuitively dense clusters of weights to ensure shared representation. The proposed approach consistently reduces the number of empty clusters in iPQ with Quant-Noise by 100x on average, uses 8x fewer iterations during empty cluster resolution, and improves overall model accuracy by up to 12%, when applied to RoBERTa on a variety of tasks in the GLUE benchmark.
- On Strategies to Fix Degenerate k-means Solutions. Journal of Classification, 34(2):165–190.
- Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv, abs/1308.3432.
- Hua Chun. 2021. A hybrid genetic xk-means++ clustering algorithm with empty cluster reassignment. In 2021 13th International Conference on Advanced Computational Intelligence (ICACI), pages 253–258.
- Matthieu Courbariaux and Yoshua Bengio. 2016. Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. ArXiv, abs/1602.02830.
- Binaryconnect: Training deep neural networks with binary weights during propagations. In NIPS.
- Universal transformers. ArXiv, abs/1807.03819.
- Training with quantization noise for extreme model compression.
- Coordinate descent method for k-means. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5):2371–2385.
- Deep learning with limited numerical precision. In International Conference on Machine Learning.
- Distilling the knowledge in a neural network. pages 1–9.
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2704–2713.
- Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351.
- Optimal brain damage. In Advances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann.
- Roberta: A robustly optimized bert pretraining approach. ArXiv.
- SPÂ Lloyd. 1957. Least square quantization in pcm. bell telephone laboratories paper. published in journal much later: Lloyd, sp: Least squares quantization in pcm. IEEE Trans. Inform. Theor.(1957/1982), 18(11).
- Simulmt to simulst: Adapting simultaneous text translation to end-to-end simultaneous speech translation. In AACL.
- fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. In NeurIPS EMC22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT Workshop.
- And the bit goes down: Revisiting the quantization of neural networks. In International Conference on Learning Representations (ICLR).
- Aurora Torrente and Juan Romo. 2020. Initializing k-means clustering by bootstrap and data depth. Journal of Classification, 38.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.