Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
96 tokens/sec
Gemini 2.5 Pro Premium
42 tokens/sec
GPT-5 Medium
20 tokens/sec
GPT-5 High Premium
27 tokens/sec
GPT-4o
100 tokens/sec
DeepSeek R1 via Azure Premium
86 tokens/sec
GPT OSS 120B via Groq Premium
464 tokens/sec
Kimi K2 via Groq Premium
181 tokens/sec
2000 character limit reached

A Closer Look at Knowledge Distillation with Features, Logits, and Gradients (2203.10163v1)

Published 18 Mar 2022 in cs.LG, cs.AI, and cs.CV

Abstract: Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. A vast number of methods have been developed for this strategy. While most method designs a more efficient way to facilitate knowledge transfer, less attention has been put on comparing the effect of knowledge sources such as features, logits, and gradients. This work provides a new perspective to motivate a set of knowledge distillation strategies by approximating the classical KL-divergence criteria with different knowledge sources, making a systematic comparison possible in model compression and incremental learning. Our analysis indicates that logits are generally a more efficient knowledge source and suggests that having sufficient feature dimensions is crucial for the model design, providing a practical guideline for effective KD-based transfer learning.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.