Principled mitigation of spurious linguistic artifacts in SDFT

Develop a principled method to prevent the student model in Self-Distillation Fine-Tuning (SDFT) from inheriting spurious teacher-conditioned linguistic markers (for example, prefatory phrases such as “Based on the text...” or “Following the example...”) that arise because the teacher is conditioned on demonstrations or source passages, thereby eliminating reliance on heuristic loss masking of the initial tokens during training.

Background

SDFT uses a demonstration-conditioned teacher to guide a student model on-policy. A failure mode noted by the authors is that the student can learn superficial linguistic patterns present in the teacher’s outputs, such as phrases indicating the presence of external context, even though the student receives no such context.

The paper reports that masking the loss over the first few tokens suppresses these artifacts in practice, but this is a heuristic workaround. The authors explicitly state that a more principled solution remains an open problem.

References

A subtle failure mode of our approach is that the student can inherit spurious linguistic patterns from the teacher. Because the teacher is conditioned on demonstrations or text passages, it may produce responses prefaced with phrases like "Based on the text..." or "Following the example..." The student, although receiving no such context, sometimes nevertheless reproduces these markers, having learned them as part of the teacher's output distribution. Empirically, we find that masking the loss over the first few tokens during training effectively suppresses these artifacts without harming downstream accuracy. While this workaround is effective in practice, it is fundamentally a heuristic fix. A more principled solution remains an open problem.

— Self-Distillation Enables Continual Learning (2601.19897 - Shenfeld et al., 27 Jan 2026) in Discussion and Limitations, Learned Artifacts paragraph

Principled mitigation of spurious linguistic artifacts in SDFT

Background

References

Related Problems