Designing strong baselines for ternary neural network quantization through support and mass equalization (2306.17442v1)
Abstract: Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. These results rely on over-parameterized backbones, which are expensive to run. This computational burden can be dramatically reduced by quantizing (in either data-free (DFQ), post-training (PTQ) or quantization-aware training (QAT) scenarios) floating point values to ternary values (2 bits, with each weight taking value in {-1,0,1}). In this context, we observe that rounding to nearest minimizes the expected error given a uniform distribution and thus does not account for the skewness and kurtosis of the weight distribution, which strongly affects ternary quantization performance. This raises the following question: shall one minimize the highest or average quantization error? To answer this, we design two operators: TQuant and MQuant that correspond to these respective minimization tasks. We show experimentally that our approach allows to significantly improve the performance of ternary quantization through a variety of scenarios in DFQ, PTQ and QAT and give strong insights to pave the way for future research in deep neural network quantization.
- “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
- “Mobilenetv2: Inverted residuals and linear bottlenecks,” in CVPR, 2018, pp. 4510–4520.
- “Efficientnet: Rethinking model scaling for convolutional neural networks,” ICML, pp. 6105–6114, 2019.
- “ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR, 2009.
- “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results,” 2012.
- “The cityscapes dataset for semantic urban scene understanding,” in CVPR, 2016, pp. 3213–3223.
- “Data-free quantization through weight equalization and bias correction,” in ICCV, 2019, pp. 1325–1334.
- “Same, same but different: Recovering neural network quantization error through weight factorization,” in ICML, 2019, pp. 4486–4495.
- Guo Cong et al., “Squant: On-the-fly data-free quantization via diagonal hessian approximation,” ICLR, 2022.
- “Rex: Data-free residual quantization error expansion,” arXiv preprint arXiv:2203.14645, 2022.
- “Powerquant: Automorphism search for non-uniform quantization,” ICLR, 2023.
- “Post training 4-bit quantization of convolutional networks for rapid-deployment,” NeurIPS, vol. 32, 2019.
- “Up or down? adaptive rounding for post-training quantization,” in ICML. PMLR, 2020, pp. 7197–7206.
- “Brecq: Pushing the limit of post-training quantization by block reconstruction,” ICLR, 2021.
- “Training and inference with integers in deep neural networks,” ICLR, 2018.
- “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in CVPR, 2018, pp. 2704–2713.
- “Variational network quantization,” in ICLR, 2018.
- “Relaxed quantization for discretized neural networks,” ICLR, 2018.
- “A quantization-friendly separable convolution for mobilenets,” in 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2). IEEE, 2018, pp. 14–18.
- “Soft weight-sharing for neural network compression,” ICLR, 2017.
- “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint arXiv:1606.06160, 2016.
- “Value-aware quantization for training and inference of neural networks,” in ECCV, 2018, pp. 580–595.
- “Spiq: Data-free per-channel static input quantization,” in WACV, 2023, pp. 3869–3878.
- “Trained ternary quantization,” arXiv preprint arXiv:1612.01064, 2016.
- “Red++: Data-free pruning of deep neural networks via input splitting and output merging,” TPAMI, 2022.
- “Ternary weight networks,” arXiv preprint arXiv:1605.04711, 2016.
- “Understanding straight-through estimator in training activation quantized neural nets,” in ICLR, 2019.
- Continuous univariate distributions, volume 2, vol. 289, John wiley & sons, 1995.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.