Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models (2504.13825v1)

Published 18 Apr 2025 in cs.CL and cs.LG

Abstract: Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various applications including image classification, object detection, LLMing, text classification, and sentiment analysis. Recent innovations in KD methods, such as attention-based approaches, block-wise logit distillation, and decoupling distillation, have notably improved student model performance. These techniques focus on stimulus complexity, attention mechanisms, and global information capture to optimize knowledge transfer. In addition, KD has proven effective in compressing LLMs while preserving accuracy, reducing computational overhead, and improving inference speed. This survey synthesizes the latest literature, highlighting key findings, contributions, and future directions in knowledge distillation to provide insights for researchers and practitioners on its evolving role in artificial intelligence and machine learning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Junjie Yang (74 papers)
  2. Junhao Song (15 papers)
  3. Xudong Han (40 papers)
  4. Ziqian Bi (37 papers)
  5. Tianyang Wang (80 papers)
  6. Chia Xin Liang (13 papers)
  7. Xinyuan Song (32 papers)
  8. Yichao Zhang (66 papers)
  9. Qian Niu (158 papers)
  10. Benji Peng (30 papers)
  11. Keyu Chen (76 papers)
  12. Ming Liu (421 papers)