Papers
Topics
Authors
Recent
2000 character limit reached

Designing strong baselines for ternary neural network quantization through support and mass equalization (2306.17442v1)

Published 30 Jun 2023 in cs.CV

Abstract: Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. These results rely on over-parameterized backbones, which are expensive to run. This computational burden can be dramatically reduced by quantizing (in either data-free (DFQ), post-training (PTQ) or quantization-aware training (QAT) scenarios) floating point values to ternary values (2 bits, with each weight taking value in {-1,0,1}). In this context, we observe that rounding to nearest minimizes the expected error given a uniform distribution and thus does not account for the skewness and kurtosis of the weight distribution, which strongly affects ternary quantization performance. This raises the following question: shall one minimize the highest or average quantization error? To answer this, we design two operators: TQuant and MQuant that correspond to these respective minimization tasks. We show experimentally that our approach allows to significantly improve the performance of ternary quantization through a variety of scenarios in DFQ, PTQ and QAT and give strong insights to pave the way for future research in deep neural network quantization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  2. “Mobilenetv2: Inverted residuals and linear bottlenecks,” in CVPR, 2018, pp. 4510–4520.
  3. “Efficientnet: Rethinking model scaling for convolutional neural networks,” ICML, pp. 6105–6114, 2019.
  4. “ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR, 2009.
  5. “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results,” 2012.
  6. “The cityscapes dataset for semantic urban scene understanding,” in CVPR, 2016, pp. 3213–3223.
  7. “Data-free quantization through weight equalization and bias correction,” in ICCV, 2019, pp. 1325–1334.
  8. “Same, same but different: Recovering neural network quantization error through weight factorization,” in ICML, 2019, pp. 4486–4495.
  9. Guo Cong et al., “Squant: On-the-fly data-free quantization via diagonal hessian approximation,” ICLR, 2022.
  10. “Rex: Data-free residual quantization error expansion,” arXiv preprint arXiv:2203.14645, 2022.
  11. “Powerquant: Automorphism search for non-uniform quantization,” ICLR, 2023.
  12. “Post training 4-bit quantization of convolutional networks for rapid-deployment,” NeurIPS, vol. 32, 2019.
  13. “Up or down? adaptive rounding for post-training quantization,” in ICML. PMLR, 2020, pp. 7197–7206.
  14. “Brecq: Pushing the limit of post-training quantization by block reconstruction,” ICLR, 2021.
  15. “Training and inference with integers in deep neural networks,” ICLR, 2018.
  16. “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in CVPR, 2018, pp. 2704–2713.
  17. “Variational network quantization,” in ICLR, 2018.
  18. “Relaxed quantization for discretized neural networks,” ICLR, 2018.
  19. “A quantization-friendly separable convolution for mobilenets,” in 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2). IEEE, 2018, pp. 14–18.
  20. “Soft weight-sharing for neural network compression,” ICLR, 2017.
  21. “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint arXiv:1606.06160, 2016.
  22. “Value-aware quantization for training and inference of neural networks,” in ECCV, 2018, pp. 580–595.
  23. “Spiq: Data-free per-channel static input quantization,” in WACV, 2023, pp. 3869–3878.
  24. “Trained ternary quantization,” arXiv preprint arXiv:1612.01064, 2016.
  25. “Red++: Data-free pruning of deep neural networks via input splitting and output merging,” TPAMI, 2022.
  26. “Ternary weight networks,” arXiv preprint arXiv:1605.04711, 2016.
  27. “Understanding straight-through estimator in training activation quantized neural nets,” in ICLR, 2019.
  28. Continuous univariate distributions, volume 2, vol. 289, John wiley & sons, 1995.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.