Papers
Topics
Authors
Recent
Search
2000 character limit reached

Critique-Guided Distillation (CGD)

Updated 17 January 2026
  • Critique-Guided Distillation (CGD) is a fine-tuning framework that integrates teacher-generated critiques to improve both answer accuracy and the underlying reasoning process.
  • It uses a three-stage distillation pipeline to refine noisy student responses, mitigating the imitation problem and preventing format drift in output answers.
  • The method leverages entropy reduction and Bayesian posterior updates to achieve enhanced model robustness and measurable gains in reasoning tasks.

Critique-Guided Distillation (CGD) is a supervised fine-tuning framework that improves upon conventional imitation-based learning protocols by integrating teacher-generated critiques into the distillation process, thereby enhancing the student model’s ability to internalize both the correct answer and its underlying rationale. CGD addresses the “imitation problem,” where standard fine-tuned LLMs reproduce responses without learning the latent reasoning process, and mitigates “format drift” observed in prior critique-based tuning methods by preserving the answer format at test time. The CGD procedure yields measurable gains in reasoning and general question answering tasks while grounding its efficacy in rigorous entropy-based uncertainty analysis and a Bayesian posterior update interpretation (Kapusuzoglu et al., 16 May 2025).

1. Motivation and Background

Standard supervised fine-tuning (SFT) trains a student model SθS_\theta using maximum likelihood estimation to reproduce gold answers yy given prompts xx. This direct imitation often causes the student to fail at generalizing beyond the demonstration set and yields brittle behavior on harder out-of-distribution reasoning instances—the so-called “imitation problem.” While Critique Fine-Tuning (CFT) partially resolves reasoning quality by making the student mimic teacher-generated critiques cc, this induces “format drift,” i.e., the student’s output distribution shifts from concise answer formats (e.g., 42\boxed{42}) towards verbose critique commentary, disrupting downstream inference pipelines. Additionally, CFT is sensitive to low-quality critiques, which may degrade or misguide the learning process more than inaccurate answer signals.

CGD is designed to overcome these limitations by ensuring that the student model learns not only to imitate the answer but also understands the rationale behind it, as encoded by explanatory critiques, without sacrificing its ability to output answers in the correct format. This dual conditioning fosters robust generalization and has empirically demonstrated marked performance improvements.

2. CGD Methodology

CGD implements a three-stage distillation pipeline:

  1. Initial Student Response: For task prompt xx, the (pre-tuned) student SθinitS_{\theta_{\rm init}} generates a noisy answer ysSθinit(yx)y_s \sim S_{\theta_{\rm init}}(y|x). This simulates authentic model errors and unrefined reasoning.
  2. Teacher Critique and Refinement: A fixed teacher model TϕT_\phi (e.g., LLaMA-3-70B Instruct) receives (x,ys)(x, y_s) and produces a critique cTϕ(cx,ys)c \sim T_\phi(c|x, y_s) followed by a refined answer ytTϕ(ytx,ys,c)y_t \sim T_\phi(y_t|x, y_s, c). The critique explains the flaws or merits of ysy_s, and the refinement corrects or improves it.
  3. Student Distillation: The training set D={(x,ys,c,yt)}\mathcal{D}' = \{(x, y_s, c, y_t)\} is constructed, and the student SθS_\theta is fine-tuned to map (x,ys,c)yt(x, y_s, c) \mapsto y_t via cross-entropy minimization. At test time, when provided just a prompt xx, the student self-generates its own ysy_s and cc internally, collapsing the multi-pass procedure into a single model invocation.

This architecture ensures the output remains in the answer format and leverages critique conditioning, achieving the dual objective of “what” to output and “why.”

3. Formal Training Objective

The CGD loss function is defined as

L(θ)=E(x,ys,c,yt)D[logSθ(ytx,ys,c)]\mathcal L(\theta) = \mathbb E_{(x, y_s, c, y_t)\sim\mathcal D'}\Bigl[ -\log S_\theta(y_t|x, y_s, c) \Bigr]

where

  • θ\theta: student parameters,
  • ϕ\phi: teacher (fixed) parameters,
  • ysSθinit(yx)y_s \sim S_{\theta_{\rm init}}(y|x),
  • cTϕ(cx,ys)c \sim T_\phi(c|x, y_s),
  • ytTϕ(ytx,ys,c)y_t \sim T_\phi(y_t|x, y_s, c).

This loss can be interpreted as minimizing the expectation over the initial student response, teacher critique, and teacher-refined answer, enforcing the student to reconstruct yty_t given contextualized inputs. The iterative expectation expresses the stochasticity in sampling and the conditioning relationship inherent in the framework.

4. Theoretical Foundations

CGD’s effectiveness is underwritten by two principal analyses:

a. Entropy-Based Uncertainty Reduction

Let YY denote the student’s answer before fine-tuning. The conditional entropy H(YX)H(Y|X) measures answer uncertainty given prompt XX. Conditioning on the teacher’s critique CC always reduces uncertainty: H(YX)H(YX,C)H(Y|X) \geq H(Y|X,C) Equivalently, the KL-divergence between the student prediction and gold labels diminishes when conditioning on CC: KL(PQ(YX))KL(PQ(YX,C))\mathrm{KL}(P \| Q(Y|X)) \geq \mathrm{KL}(P \| Q(Y|X, C)) This sharply narrows the hypothesis space, enabling more confident and accurate predictions.

b. Bayesian Posterior Update Interpretation

The initial answer distribution Sθinit(yx)S_{\theta_{\rm init}}(y|x) serves as a prior, with the teacher’s critique acting as evidence. By Bayes’ rule: Sθ(yx,ys,c)Tϕ(cx,ys,y)×Sθinit(yx,ys)S_\theta(y|x, y_s, c) \propto T_\phi(c|x, y_s, y) \times S_{\theta_{\rm init}}(y|x, y_s) In parameter space: p(θD,c)p(cD,θ)p(θD)p(\theta|\mathcal D, c) \propto p(c|\mathcal D, \theta) p(\theta|\mathcal D) Conditioning on critiques thus reweights the posterior over model parameters, making the learning dynamic more data-efficient.

5. Empirical Evaluation

CGD was evaluated on diverse datasets and benchmarks:

  • Training Sets: WebInstruct (100 K samples; multi-domain) and MetaMathQA (100 K samples; math-specific).
  • Benchmarks:
    • Math Reasoning: MATH500, Minerva-Math, GSM8K, OlympiadBench, AMC23.
    • General Reasoning: TheoremQA, GPQA, MMLU-Pro.
    • QA Tasks: IFEval, MUSR, TruthfulQA, BIG-Bench Hard.
  • Models: Teacher—LLaMA-3.3-70B Instruct; Student—LLaMA-3.1-8B Instruct.
  • Baselines: (i) Standard SFT; (ii) Distilled SFT on refined answers; (iii) CFT predicting teacher critiques.

Key Results:

Task SFT (%) CFT (%) CGD (%) Absolute Gain (CGD–SFT)
AMC23 20.0 22.5 37.5 +17.5
MMLU-Pro 39.3 34.2 40.3 +6.1
IFEval 76.1 55.6 (not listed)

CGD consistently outperformed both SFT and CFT, showing a 5.4%5.4\% average gain over the strongest CFT across math reasoning tasks and preserving QA ability (e.g., IFEval: 76.1% with CGD vs 55.6% for CFT). Training used 16×A100 GPUs, batch size 64, learning rate 1×1061 \times 10^{-6}, and metrics were exact-match accuracy averaged over three seeds.

6. Comparative Analysis

a. Mitigation of Format Drift

CFT’s “generate critique” objective causes output distribution shifts—format drift—from direct answers to critique-like outputs, which can break downstream tools. CGD circumvents this by conditioning on critiques while supervising the production of final answers, thus maintaining answer format fidelity throughout training and inference.

b. Importance of Critique Conditioning and Robustness

Ablation studies confirmed that removing critiques from CGD input (“CGD w/o c”) resulted in notable declines in accuracy for challenging tasks (Minerva-Math, AMC23, MMLU-Pro). CGD exhibited robustness to learning-rate variations (stable between 1×1061 \times 10^{-6} and 5×1065 \times 10^{-6}), unlike CFT (which dropped by >9 points at higher LR). Entropy analyses demonstrated consistently lower conditional entropy and reduced KL-divergence to gold answers in CGD, reflecting more confident and accurate model predictions.

7. Contributions, Limitations, and Future Directions

a. Main Contributions

  1. Introduction of a multi-stage CGD pipeline integrating explanatory critiques without altering answer format.
  2. Theoretical justification via entropy reduction and Bayesian posterior interpretation.
  3. Empirical evidence of substantial absolute improvements on math and language understanding tasks with no additional test-time overhead.

b. Limitations

  • Generation of (ys,c,yt)(y_s, c, y_t) triplets incurs extra computational overhead.
  • CGD’s effectiveness depends on the quality of teacher-produced critiques.
  • Experiments were limited to the LLaMA model family and selected reasoning/QA benchmarks.

c. Future Work

  • Automated estimation or filtering of critique quality to mitigate misleading feedback.
  • Extension of CGD to multimodal domains or tool-backed contingent critiques.
  • Engineering safety- or bias-focused critiques for enhanced model alignment and robustness.

CGD represents a principled and empirically validated approach to combining answer correctness and explanatory feedback in LLM training, providing new opportunities for robust supervised fine-tuning and further methodological refinement (Kapusuzoglu et al., 16 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Critique-Guided Distillation (CGD).