Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning (2311.13934v1)

Published 23 Nov 2023 in cs.CV

Abstract: The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore, in previous KD approaches, we observed that data augmentation, a technique aimed at enhancing a model's generalization, can have an adverse impact. Therefore, we propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning. This approach enables KD to effectively incorporate data augmentation for performance improvement. Extensive experiments on various datasets, including CIFAR-100, FGVR, TinyImagenet, and ImageNet, demonstrate our method's superiority over current state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Variational information distillation for knowledge transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9163–9171, 2019.
  2. Knowledge distillation: A good teacher is patient and consistent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10925–10934, 2022.
  3. Effect of different distance measures on the performance of k-means algorithm: an experimental study in matlab. arXiv preprint arXiv:1405.7471, 2014.
  4. Pkd: General distillation framework for object detectors via pearson correlation coefficient. arXiv preprint arXiv:2207.02039, 2022.
  5. Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems, 30, 2017.
  6. Distilling knowledge via knowledge review. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5008–5017, 2021.
  7. Distilling knowledge via knowledge review. In CVPR, 2021.
  8. Ambiguity-aware robust teacher (art): Enhanced self-knowledge distillation framework with pruned teacher network. Pattern Recognition, 140:109541, 2023.
  9. Isotonic data augmentation for knowledge distillation. arXiv preprint arXiv:2107.01412, 2021.
  10. An empirical analysis of the impact of data augmentation on knowledge distillation. arXiv preprint arXiv:2006.03810, 2020.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  12. Triplet distillation for deep face recognition. In 2020 IEEE International Conference on Image Processing (ICIP), pages 808–812. IEEE, 2020.
  13. P-pseudolabel: Enhanced pseudo-labeling framework with network pruning in semi-supervised learning. IEEE Access, 10:115652–115662, 2022.
  14. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  15. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1921–1930, 2019.
  16. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  17. Distilling the knowledge in a neural network. In arXiv:1503.02531, 2015.
  18. What do compressed deep neural networks forget? arXiv preprint arXiv:1911.05248, 2019.
  19. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  20. Self-damaging contrastive learning. arXiv preprint arXiv:2106.02990, 2021.
  21. Multi-level logit distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24276–24285, 2023.
  22. Alboukadel Kassambara. Practical guide to cluster analysis in R: Unsupervised machine learning, volume 1. Sthda, 2017.
  23. Maurice G Kendall. Rank correlation methods. new york: Hafner, 1955. Manuscript received 3/30, 65, 1955.
  24. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC), volume 2, 2011.
  25. The instability of the pearson correlation coefficient in the presence of coincidental outliers. Finance Research Letters, 13:243–257, 2015.
  26. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In ICLR, 2017.
  27. Learning multiple layers of features from tiny images. 2009.
  28. Asymmetric temperature scaling makes larger networks teach well again. Advances in Neural Information Processing Systems, 35:3830–3842, 2022.
  29. Improved knowledge distillation via teacher assistant. In AAAI, 2020.
  30. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3967–3976, 2019.
  31. Probabilistic knowledge transfer for lightweight deep representation learning. IEEE Transactions on Neural Networks and Learning Systems, 32(5):2030–2039, 2020.
  32. Karl Pearson. Vii. mathematical contributions to the theory of evolution.—iii. regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, (187):253–318, 1896.
  33. Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 413–420. IEEE, 2009.
  34. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
  35. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
  36. K. Simonyan and A Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, May 2015.
  37. Charles Spearman. The proof and measurement of association between two things. 1961.
  38. Contrastive representation distillation. In International Conference on Learning Representations.
  39. The caltech-ucsd birds-200-2011 dataset. 2011.
  40. Collaborative distillation for ultra-resolution universal style transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1860–1869, 2020.
  41. What makes a” good” data augmentation in knowledge distillation-a statistical perspective. Advances in Neural Information Processing Systems, 35:13456–13469, 2022.
  42. f-divergence minimization for sequence-level knowledge distillation. arXiv preprint arXiv:2307.15190, 2023.
  43. Human action recognition by learning bases of action attributes and parts. In 2011 International Conference on Computer Vision, pages 1331–1338. IEEE, 2011.
  44. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE International Conference on Computer Vision, pages 6023–6032, 2019.
  45. Wide residual networks. In British Machine Vision Conference 2016. British Machine Vision Association, 2016.
  46. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
  47. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6848–6856, 2018.
  48. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4320–4328, 2018.
  49. Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11953–11962, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Seonghak Kim (22 papers)
  2. Gyeongdo Ham (3 papers)
  3. Yucheol Cho (2 papers)
  4. Daeshik Kim (6 papers)
Citations (1)