Identify effective token-importance signals and position-selection policies for selective KD in autoregressive LLMs
Determine which token-level importance signals most reliably identify positions that benefit from logit-based knowledge distillation in autoregressive large language models, and characterize how different position-selection policies interact with these signals to yield an effective distillation curriculum.
References
Yet, it remains unclear which token-importance signals most reliably identify positions that benefit from logit-based distillation in LLMs, and how different position-selection policies interact with these signals to shape an effective distillation curriculum.
— Rethinking Selective Knowledge Distillation
(2602.01395 - Tavor et al., 1 Feb 2026) in Section 1 (Introduction), second paragraph