Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Granularity Semantic Revision for Large Language Model Distillation (2407.10068v1)

Published 14 Jul 2024 in cs.CL

Abstract: Knowledge distillation plays a key role in compressing the LLMs, which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art struggle to align the most informative part due to the complex distribution of LLMs' outputs. To address these problems, we propose a multi-granularity semantic revision method for LLM distillation. At the sequence level, we propose a sequence correction and re-generation (SCRG) strategy. SCRG first calculates the semantic cognitive difference between the teacher and student to detect the error token, then corrects it with the teacher-generated one, and re-generates the sequence to reduce generation errors and enhance generation diversity. At the token level, we design a distribution adaptive clipping Kullback-Leibler (DAC-KL) loss as the distillation objective function. DAC-KL loss exploits a learnable sub-network to adaptively extract semantically dense areas from the teacher's output, avoiding the interference of redundant information in the distillation process. Finally, at the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent, further enhancing the transfer of semantic information. Extensive experiments across different model families with parameters ranging from 0.1B to 13B demonstrate the superiority of our method compared to existing methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Xiaoyu Liu (138 papers)
  2. Yun Zhang (103 papers)
  3. Wei Li (1121 papers)
  4. Simiao Li (6 papers)
  5. Xudong Huang (8 papers)
  6. Hanting Chen (52 papers)
  7. Yehui Tang (63 papers)
  8. Jie Hu (187 papers)
  9. Zhiwei Xiong (83 papers)
  10. Yunhe Wang (145 papers)