Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation (2505.11628v2)

Published 16 May 2025 in cs.CL and cs.LG

Abstract: Supervised fine-tuning (SFT) using expert demonstrations often suffer from the imitation problem, where the model learns to reproduce the correct responses without understanding the underlying rationale. To address this limitation, we propose Critique-Guided Distillation (CGD), a novel multi-stage framework that integrates teacher model generated explanatory critiques and refined responses into the SFT process. A student model is then trained to map the triplet of prompt, teacher critique, and its own initial response to the corresponding refined teacher response, thereby learning both what to imitate and why. Using entropy-based analysis, we show that CGD reduces refinement uncertainty and can be interpreted as a Bayesian posterior update. We perform extensive empirical evaluation of CGD, on variety of benchmark tasks, and demonstrate significant gains on both math (AMC23 +17.5%) and language understanding tasks (MMLU-Pro +6.3%), while successfully mitigating the format drift issues observed in previous critique fine-tuning (CFT) techniques.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Berkcan Kapusuzoglu (1 paper)
  2. Supriyo Chakraborty (26 papers)
  3. Chia-Hsuan Lee (12 papers)
  4. Sambit Sahu (6 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com