Class-aware Information for Logit-based Knowledge Distillation (2211.14773v1)

Published 27 Nov 2022 in cs.CV and cs.LG

Abstract: Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due to the cumbersome computation and storage of extra feature transformation, the training overhead of feature-based methods is much higher than that of logit-based distillation. In this work, we revisit the logit-based knowledge distillation, and observe that the existing logit-based distillation methods treat the prediction logits only in the instance level, while many other useful semantic information is overlooked. To address this issue, we propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level. CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance. We further introduce a novel loss called Class Correlation Loss to force the student learn the inherent class-level correlation of the teacher. Empirical comparisons demonstrate the superiority of the proposed method over several prevailing logit-based methods and feature-based methods, in which CLKD achieves compelling results on various visual classification tasks and outperforms the state-of-the-art baselines.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (4)

Shuoxi Zhang (7 papers)
Hanpeng Liu (7 papers)
John E. Hopcroft (34 papers)
Kun He (177 papers)

Citations (1)

View on Semantic Scholar

Class-aware Information for Logit-based Knowledge Distillation (2211.14773v1)

Related Papers