Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 28 tok/s
GPT-5 High 35 tok/s Pro
GPT-4o 94 tok/s
GPT OSS 120B 476 tok/s Pro
Kimi K2 190 tok/s Pro
2000 character limit reached

Dynamic Temperature Knowledge Distillation (2404.12711v1)

Published 19 Apr 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Temperature plays a pivotal role in moderating label softness in the realm of knowledge distillation (KD). Traditional approaches often employ a static temperature throughout the KD process, which fails to address the nuanced complexities of samples with varying levels of difficulty and overlooks the distinct capabilities of different teacher-student pairings. This leads to a less-than-ideal transfer of knowledge. To improve the process of knowledge propagation, we proposed Dynamic Temperature Knowledge Distillation (DTKD) which introduces a dynamic, cooperative temperature control for both teacher and student models simultaneously within each training iterafion. In particular, we proposed "\textbf{sharpness}" as a metric to quantify the smoothness of a model's output distribution. By minimizing the sharpness difference between the teacher and the student, we can derive sample-specific temperatures for them respectively. Extensive experiments on CIFAR-100 and ImageNet-2012 demonstrate that DTKD performs comparably to leading KD techniques, with added robustness in Target Class KD and None-target Class KD scenarios.The code is available at https://github.com/JinYu1998/DTKD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Distilling knowledge via knowledge review. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5008–5017, 2021.
  2. Knowledge distillation with the reused teacher classifier. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11933–11942, 2022.
  3. Normkd: Normalized logits for knowledge distillation. arXiv preprint arXiv:2308.00520, 2023.
  4. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215–223. JMLR Workshop and Conference Proceedings, 2011.
  5. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  6. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1921–1930, 2019.
  7. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  8. Annealing knowledge distillation. arXiv preprint arXiv:2104.07163, 2021.
  9. Learning multiple layers of features from tiny images. 2009.
  10. Learning from noisy labels with distillation. In Proceedings of the IEEE international conference on computer vision, pages 1910–1918, 2017.
  11. Asymmetric temperature scaling makes larger networks teach well again. Advances in Neural Information Processing Systems, 35:3830–3842, 2022.
  12. Curriculum temperature for knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1504–1512, 2023.
  13. Meta knowledge distillation. arXiv preprint arXiv:2202.07940, 2022.
  14. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV), pages 116–131, 2018.
  15. Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 5191–5198, 2020.
  16. Relational knowledge distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3967–3976, 2019.
  17. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  18. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
  19. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  20. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
  21. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  22. Understanding and improving knowledge distillation. arXiv preprint arXiv:2002.03532, 2020.
  23. Contrastive representation distillation. arXiv preprint arXiv:1910.10699, 2019.
  24. Revisiting knowledge distillation via label smoothing regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3903–3911, 2020.
  25. Student-friendly knowledge distillation. arXiv preprint arXiv:2305.10893, 2023.
  26. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928, 2016.
  27. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  28. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6848–6856, 2018.
  29. Deep mutual learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4320–4328, 2018.
  30. Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11953–11962, 2022.
  31. Rethinking soft labels for knowledge distillation: A bias-variance tradeoff perspective. arXiv preprint arXiv:2102.00650, 2021.
Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets