Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Poster: Self-Supervised Quantization-Aware Knowledge Distillation (2309.13220v1)

Published 22 Sep 2023 in cs.CV and cs.AI

Abstract: Quantization-aware training (QAT) starts with a pre-trained full-precision model and performs quantization during retraining. However, existing QAT works require supervision from the labels and they suffer from accuracy loss due to reduced precision. To address these limitations, this paper proposes a novel Self-Supervised Quantization-Aware Knowledge Distillation framework (SQAKD). SQAKD first unifies the forward and backward dynamics of various quantization functions and then reframes QAT as a co-optimization problem that simultaneously minimizes the KL-Loss and the discretization error, in a self-supervised manner. The evaluation shows that SQAKD significantly improves the performance of various state-of-the-art QAT works. SQAKD establishes stronger baselines and does not require extensive labeled training data, potentially making state-of-the-art QAT research more accessible.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. K. Zhao, H. D. Nguyen, A. Jain, N. Susanj, A. Mouchtaris, L. Gupta, and M. Zhao, “Knowledge distillation via module replacing for automatic speech recognition with recurrent neural network transducer,” 2022.
  2. K. Zhao, Y. Chen, and M. Zhao, “A contrastive knowledge transfer framework for model compression and transfer learning,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, IEEE, 2023.
  3. K. Zhao, A. Jain, and M. Zhao, “Automatic attention pruning: Improving and automating model pruning using attentions,” in International Conference on Artificial Intelligence and Statistics, pp. 10470–10486, PMLR, 2023.
  4. M. Á. Carreira-Perpiñán and Y. Idelbayev, “Model compression as constrained optimization, with application to neural nets. part v: combining compressions,” arXiv preprint arXiv:2107.04380, 2021.
  5. Y. Idelbayev and M. Á. Carreira-Perpiñán, “Lc: A flexible, extensible open-source toolkit for model compression,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 4504–4514, 2021.
  6. Y. Chen, K. Zhao, B. Li, and M. Zhao, “Exploring the use of synthetic gradients for distributed deep learning across cloud and edge resources,” in 2nd USENIX Workshop on Hot Topics in Edge Computing (HotEdge 19), 2019.
  7. M. Zhao, “Knowledgenet: Disaggregated and distributed training and serving of deep neural networks,”
  8. Y. Chen, S. Biookaghazadeh, and M. Zhao, “Exploring the capabilities of mobile devices in supporting deep learning,” in Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, pp. 127–138, 2019.
  9. S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint arXiv:1606.06160, 2016.
  10. J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan, “Pact: Parameterized clipping activation for quantized neural networks,” arXiv preprint arXiv:1805.06085, 2018.
  11. S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, “Learned step size quantization,” arXiv preprint arXiv:1902.08153, 2019.
  12. J. Lee, D. Kim, and B. Ham, “Network quantization with element-wise gradient scaling,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6448–6457, 2021.
  13. Y. Boo, S. Shin, J. Choi, and W. Sung, “Stochastic precision ensemble: self-knowledge distillation for quantized deep neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 6794–6802, 2021.
  14. Y. Li, M. Shen, J. Ma, Y. Ren, M. Zhao, Q. Zhang, R. Gong, F. Yu, and J. Yan, “Mqbench: Towards reproducible and deployable model quantization benchmark,” arXiv preprint arXiv:2111.03759, 2021.
  15. Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013.
  16. Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th annual international conference on machine learning, pp. 41–48, 2009.
  17. G. Hinton, O. Vinyals, J. Dean, et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
  18. K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 630–645, Springer, 2016.
  19. D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in Proceedings of the IEEE international conference on computer vision, pp. 2650–2658, 2015.
  20. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.
  21. X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6848–6856, 2018.
  22. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
  23. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
  24. A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny images,” tech. rep., Citeseer, 2009.
  25. Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, p. 3, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Kaiqi Zhao (20 papers)
  2. Ming Zhao (106 papers)