Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AQD: Towards Accurate Fully-Quantized Object Detection (2007.06919v5)

Published 14 Jul 2020 in cs.CV

Abstract: Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices. However, designing aggressively low-bit (e.g., 2-bit) quantization schemes on complex tasks, such as object detection, still remains challenging in terms of severe performance degradation and unverifiable efficiency on common hardware. In this paper, we propose an Accurate Quantized object Detection solution, termed AQD, to fully get rid of floating-point computation. To this end, we target using fixed-point operations in all kinds of layers, including the convolutional layers, normalization layers, and skip connections, allowing the inference to be executed using integer-only arithmetic. To demonstrate the improved latency-vs-accuracy trade-off, we apply the proposed methods on RetinaNet and FCOS. In particular, experimental results on MS-COCO dataset show that our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes, which is of great practical value. Source code and models are available at: https://github.com/ziplab/QTool

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Scalable methods for 8-bit training of neural networks. In Proc. Adv. Neural Inf. Process. Syst., pages 5145–5153, 2018.
  2. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
  3. Hierarchical binary cnns for landmark localization with limited resources. IEEE Trans. Pattern Anal. Mach. Intell., 2018.
  4. Deep learning with low precision by half-wave gaussian quantization. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 5918–5926, 2017.
  5. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell., 40(4):834–848, 2017.
  6. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018.
  7. Automating generation of low precision deep learning operators. arXiv preprint arXiv:1810.11066, 2018.
  8. Learned step size quantization. In Proc. Int. Conf. Learn. Repren., 2020.
  9. EIE: Efficient inference engine on compressed deep neural network. In ACM/IEEE Annual Int. Symp. Computer Architecture, pages 243–254, 2016.
  10. Mask r-cnn. In Proc. IEEE Int. Conf. Comp. Vis., pages 2961–2969, 2017.
  11. Deep residual learning for image recognition. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 770–778, 2016.
  12. Identity mappings in deep residual networks. In Proc. Eur. Conf. Comp. Vis., pages 630–645, 2016.
  13. Channel pruning for accelerating very deep neural networks. In Proc. IEEE Int. Conf. Comp. Vis., pages 1389–1397, 2017.
  14. M. Horowitz. 1.1 computing’s energy problem (and what we can do about it). In Proc. IEEE Int. Solid-State Circuits Conf. Digest of Tech. Papers (ISSCC), pages 10–14, 2014.
  15. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  16. Binarized neural networks. In Proc. Adv. Neural Inf. Process. Syst., pages 4107–4115, 2016.
  17. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. Int. Conf. Mach. Learn., pages 448–456, 2015.
  18. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2704–2713, 2018.
  19. Mnn: A universal and efficient inference engine. In Proc. The Conf. Machine Learning & Systems, 2020.
  20. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., June 2019.
  21. Imagenet classification with deep convolutional neural networks. In Proc. Adv. Neural Inf. Process. Syst., pages 1097–1105, 2012.
  22. Cornernet: Detecting objects as paired keypoints. In Proc. Eur. Conf. Comp. Vis., pages 734–750, 2018.
  23. Fully quantized network for object detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., June 2019.
  24. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In Proc. Int. Conf. Learn. Repren., 2020.
  25. High-performance fpga-based cnn accelerator with block-floating-point arithmetic. IEEE T. Very Large Scale Integration (VLSI) Systems, pages 1–12, 05 2019.
  26. Feature pyramid networks for object detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2117–2125, 2017.
  27. Focal loss for dense object detection. In Proc. IEEE Int. Conf. Comp. Vis., pages 2980–2988, 2017.
  28. Microsoft coco: Common objects in context. In Proc. Eur. Conf. Comp. Vis., pages 740–755. Springer, 2014.
  29. Reactnet: Towards precise binary neural network with generalized activation functions. In Proc. Eur. Conf. Comp. Vis., 2020.
  30. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In Proc. Eur. Conf. Comp. Vis., pages 722–737, 2018.
  31. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 3431–3440, 2015.
  32. Thinet: A filter level pruning method for deep neural network compression. In Proc. IEEE Int. Conf. Comp. Vis., pages 5058–5066, 2017.
  33. Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025, 2016.
  34. Megdet: A large mini-batch object detector. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 6181–6189, 2018.
  35. Efficient neural architecture search via parameter sharing. In Proc. Int. Conf. Mach. Learn., pages 4092–4101, 2018.
  36. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proc. Eur. Conf. Comp. Vis., pages 525–542, 2016.
  37. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proc. Adv. Neural Inf. Process. Syst., pages 91–99, 2015.
  38. Imagenet large scale visual recognition challenge. Int. J. Comp. Vis., 115(3):211–252, 2015.
  39. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 4510–4520, 2018.
  40. Hybrid 8-bit floating point (hfp8) training and inference for deep neural networks. In Proc. Adv. Neural Inf. Process. Syst., pages 4900–4909, 2019.
  41. FCOS: Fully convolutional one-stage object detection. In Proc. IEEE Int. Conf. Comp. Vis., 2019.
  42. Training deep neural networks with 8-bit floating point numbers. In Proc. Adv. Neural Inf. Process. Syst., pages 7675–7684, 2018.
  43. Quantization mimic: Towards very tiny cnn for object detection. In Proc. Eur. Conf. Comp. Vis., September 2018.
  44. Group normalization. In Proc. Eur. Conf. Comp. Vis., pages 3–19, 2018.
  45. Detectron2. https://github.com/facebookresearch/detectron2, 2019.
  46. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proc. Eur. Conf. Comp. Vis., 2018.
  47. dabnn: A super fast inference framework for binary neural networks on arm devices, 2019.
  48. Incremental network quantization: Towards lossless cnns with low-precision weights. In Proc. Int. Conf. Learn. Repren., 2017.
  49. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
  50. Training quantized neural networks with a full-precision auxiliary module. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., June 2020.
  51. Structured binary neural networks for image recognition. arXiv preprint arXiv:1909.09934, 2019.
  52. Towards effective low-bitwidth convolutional neural networks. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 7920–7928, 2018.
  53. Discrimination-aware channel pruning for deep neural networks. In Proc. Adv. Neural Inf. Process. Syst., pages 881–892. 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Peng Chen (324 papers)
  2. Jing Liu (526 papers)
  3. Bohan Zhuang (79 papers)
  4. Mingkui Tan (124 papers)
  5. Chunhua Shen (404 papers)
Citations (11)