Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fisher-aware Quantization for DETR Detectors with Critical-category Objectives (2407.03442v1)

Published 3 Jul 2024 in cs.CV

Abstract: The impact of quantization on the overall performance of deep learning models is a well-studied problem. However, understanding and mitigating its effects on a more fine-grained level is still lacking, especially for harder tasks such as object detection with both classification and regression objectives. This work defines the performance for a subset of task-critical categories, i.e. the critical-category performance, as a crucial yet largely overlooked fine-grained objective for detection tasks. We analyze the impact of quantization at the category-level granularity, and propose methods to improve performance for the critical categories. Specifically, we find that certain critical categories have a higher sensitivity to quantization, and are prone to overfitting after quantization-aware training (QAT). To explain this, we provide theoretical and empirical links between their performance gaps and the corresponding loss landscapes with the Fisher information framework. Using this evidence, we apply a Fisher-aware mixed-precision quantization scheme, and a Fisher-trace regularization for the QAT on the critical-category loss landscape. The proposed methods improve critical-category metrics of the quantized transformer-based DETR detectors. They are even more significant in case of larger models and higher number of classes where the overfitting becomes more severe. For example, our methods lead to 10.4% and 14.5% mAP gains for, correspondingly, 4-bit DETR-R50 and Deformable DETR on the most impacted critical classes in the COCO Panoptic dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org, 2019. http://www.fairmlbook.org.
  2. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432, 2013.
  3. Once-for-All: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations (ICLR), 2020.
  4. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  5. PACT: Parameterized clipping activation for quantized neural networks. arXiv:1805.06085, 2018.
  6. The Cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  7. Centripetal SGD for pruning very deep convolutional networks with complicated structure. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  8. HAWQ: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
  9. HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. Advances in neural information processing systems, 2020.
  10. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations (ICLR), 2021.
  11. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
  12. Girshick, R. Fast R-CNN. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2015.
  13. Recall distortion in neural network pruning and the undecayed pruning algorithm. In Advances in Neural Information Processing Systems, 2022.
  14. AutoDO: Robust autoaugment for biased data with label noise via scalable probabilistic implicit differentiation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  15. SQuant: On-the-fly data-free quantization via diagonal Hessian approximation. In International Conference on Learning Representations (ICLR), 2022.
  16. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv:1510.00149, 2015.
  17. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
  18. Horowitz, M. 1.1 computing’s energy problem (and what we can do about it). In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014.
  19. A fast post-training pruning framework for transformers. In Advances in Neural Information Processing Systems, 2022.
  20. Optimal brain damage. Advances in Neural Information Processing Systems, 1989.
  21. DN-DETR: Accelerate DETR training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  22. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), 2014.
  23. Sharpness-aware quantization for deep neural networks. arXiv preprint arXiv:2111.12273, 2021.
  24. DAB-DETR: Dynamic anchor boxes are better queries for DETR. In International Conference on Learning Representations (ICLR), 2022.
  25. A tutorial on Fisher information. Journal of Mathematical Psychology, 80:40–55, 2017.
  26. Fisher kernels on visual vocabularies for image categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
  27. Model compression via distillation and quantization. In International Conference on Learning Representations (ICLR), 2018.
  28. You only look once: Unified, real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  29. Tradeoffs of diagonal fisher information matrix estimators. arXiv:2402.05379, 2024.
  30. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  31. Pruning has a disparate impact on model accuracy. In Advances in Neural Information Processing Systems, 2022.
  32. Towards data-efficient detection transformers. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
  33. Learning structured sparsity in deep neural networks. In Advances in neural information processing systems, 2016.
  34. Coordinating filters for faster deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
  35. FBNet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  36. CSQ: Growing mixed-precision quantization scheme with bi-level continuous sparsification. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2023.
  37. Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp.  678–679, 2020a.
  38. DeepHoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures. In International Conference on Learning Representations (ICLR), 2020b.
  39. BSQ: Exploring bit-level sparsity for mixed-precision neural network quantization. In International Conference on Learning Representations (ICLR), 2021.
  40. HERO: Hessian-enhanced robust optimization for unifying and improving generalization and quantization performance. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2022.
  41. Global vision transformer pruning with Hessian-aware saliency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  42. HAWQ-V3: Dyadic neural network quantization. In International Conference on Machine Learning (ICML), 2021.
  43. PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  191–207, 2022.
  44. Deformable DETR: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations (ICLR), 2021.

Summary

We haven't generated a summary for this paper yet.