Papers
Topics
Authors
Recent
Search
2000 character limit reached

DUET: Distilled Unlearning from an Efficient Teacher

Updated 5 February 2026
  • The paper introduces DUET, which achieves selective and persistent forgetting by distilling teacher guidance into a student model without retraining from scratch.
  • It combines competent teacher predictions on retained data with randomized outputs on forget data via an in-context and adapter-driven framework.
  • DUET is validated across deep nets and LLMs, offering scalable, data-efficient, and robust unlearning with improved metrics over conventional retraining.

Distilled Unlearning from an Efficient Teacher (DUET) is a family of algorithms for machine unlearning that achieve selective, persistent forgetting by leveraging a distillation-based student-teacher framework. DUET combines the efficiency of in-context steering with the robustness and durability of parameter updates—enabling the removal of targeted knowledge (such as specific user data, private facts, or hazardous content) from large neural networks, particularly LLMs, while preserving the utility of retained knowledge. DUET formulations apply across deep neural architectures and unify prior approaches spanning explicit retraining, AMNESIAC learning, and prompt steering, offering scalable, data-efficient, and attack-resilient unlearning protocols (Chundawat et al., 2022, &&&1&&&, Zhong et al., 29 Jan 2026).

1. Formal Problem Statement and Motivation

The machine unlearning objective is: given a trained model MM or LLM π0\pi_0 with parameters θ\theta, a full training set D\mathcal{D}, and a designated forget set DfD\mathcal{D}_f \subset \mathcal{D} (e.g., private or undesirable data), efficiently update the model to obtain MunlearnM_{\text{unlearn}} such that: (i) the influence of Df\mathcal{D}_f is eradicated ("forgets"), and (ii) predictive quality on a retain set Dr=DDf\mathcal{D}_r = \mathcal{D}\setminus\mathcal{D}_f is preserved—without retraining from scratch.

Prior tuning-based unlearning (e.g., parameter optimization using negative loss over Df\mathcal{D}_f) is computationally intensive and susceptible to catastrophic forgetting if the balancing term is naively tuned. In-context unlearning via prompt steering is lightweight, but the knowledge removal is superficial and can be bypassed. DUET bridges these extremes by distilling the refusal or randomization behavior of an efficiently contextualized teacher (teacher model or prompt-steered LLM) into a persistent student model, achieving robust, parameter-centric forgetting (Chundawat et al., 2022, Chen et al., 2023, Zhong et al., 29 Jan 2026).

2. DUET Methodological Frameworks

In deep nets, DUET initializes a student SS from the original model MM (the "competent teacher" TsT_s) and trains it on a union of forget (Df\mathcal{D}_f) and retain (Dr\mathcal{D}_r) batches. Each sample xx is labeled as lu(x)=1l_u(x)=1 if xDfx\in\mathcal{D}_f (forget) or $0$ otherwise (retain). The student imitates TsT_s on Dr\mathcal{D}_r using KL divergence and an "incompetent" (random or weak) teacher TdT_d on Df\mathcal{D}_f, injecting targeted randomness as follows:

L(x,lu)=(1lu)KL(Ts(x) S(x))+luKL(Td(x) S(x))L(x,\,l_u) = (1 - l_u)\,\mathcal{KL}(T_s(x)\|\ S(x)) + l_u\,\mathcal{KL}(T_d(x)\|\ S(x))

By alternately minimizing this loss over Df\mathcal{D}_f and Dr\mathcal{D}_r, DUET erases knowledge on Df\mathcal{D}_f by randomizing predictions (via TdT_d), while retaining fidelity on Dr\mathcal{D}_r by matching TsT_s.

For LLMs, DUET (Chen et al., 2023) ("Efficient Unlearning"/EUL) inserts tiny, trainable "unlearning adapters" f(;W)f(\cdot;W) in each Transformer block of a frozen teacher F()F(\cdot), forming the student F(x)=F(f(x;W))F'(x) = F(f(x;W)). The optimization objective is multi-term, balancing selective KL divergence, retention loss, and anti-memorization (negative masked-LM) on the respective data splits:

LKL=α(x,y)DrKL(F(x)F(x))xDfKL(F(x)F(x)) LTASK=(x,y)Drtask(F(x),y) LLM=xDfLM(F(x)) LDUET=LKL+λLTASK+γLLM\begin{align*} L_{\mathrm{KL}} & = \alpha \sum_{(x,y)\in D^r} {\rm KL}(F(x)\,\|\,F'(x)) - \sum_{x\in D^f}{\rm KL}(F(x)\,\|\,F'(x)) \ L_{\mathrm{TASK}} & = \sum_{(x,y)\in D^r}\ell_{\mathrm{task}}(F'(x),y) \ L_{\mathrm{LM}} & = -\sum_{x\in D^f}\ell_{\mathrm{LM}}(F'(x)) \ L_{\rm DUET} & = L_{\mathrm{KL}} + \lambda L_{\mathrm{TASK}} + \gamma L_{\mathrm{LM}} \end{align*}

Alternating optimization steps on Dr\mathcal{D}^r and Df\mathcal{D}^f ensures both effective forgetting and knowledge retention.

In the latest DUET formulation (Zhong et al., 29 Jan 2026), a prompt-steered teacher LLM ("in-context refusal") guides the forgetting process. For each query x(DfDr)x\in(\mathcal{D}_f\cup\mathcal{D}_r), the teacher's first-token logits grefi(x)g^{i}_{\mathrm{ref}}(x) (with prefix xicx_{ic}) are selectively distilled into a fully parameterized student πθ\pi_\theta (no prefix) using a Huber-L1 regression on the Top-KK candidate logits:

LDUET(θ)=Ex(DfDr)[iCK(x)(gθi(x),grefi(xicx))]L_{\text{DUET}}(\theta) = \mathbb{E}_{x\sim(\mathcal{D}_f\cup\mathcal{D}_r)}\,\left[\,\sum_{i \in\mathcal{C}_K(x)} \ell(g^i_\theta(x),\,g^i_{\mathrm{ref}}(x_{ic} \oplus x))\,\right]

This logit-centric loss embeds in-context refusal behavior into model parameters, producing persistent forgetting robust to prompt removal or reset.

3. Sequential Unlearning and Fusion

DUET supports accumulation of multiple, non-overlapping unlearning operations without destructive interference. In (Chen et al., 2023), separate adapter sets WiW_i are trained for each forget request DifD^f_i, and then linearly fused by solving:

minWmiWmXifWiXif2\min_{W_m}\sum_i \|W_m^\top X^f_i - W_i^\top X^f_i\|^2

with the closed-form solution: Wm=(i(Xif)Xif)1i((Xif)XifWi)W_m = \left(\sum_i (X^f_i)^\top X^f_i\right)^{-1} \sum_i \left((X^f_i)^\top X^f_i W_i\right) where XifX^f_i are hidden representations for DifD^f_i just before the adapter. This mechanism supports efficient, "post hoc" fusion without additional backpropagation, enabling responsive deployment in settings with streaming deletion requests.

4. Quantitative Evaluation Protocols

Evaluation uses a variety of metrics to ensure that forgetting and retention are jointly quantified.

  • Zero Retrain Forgetting (ZRF) Metric (Chundawat et al., 2022): Uses Jensen–Shannon divergence between the unlearned model and the incompetent teacher on Df\mathcal{D}_f:

ZRF=11nfi=1nfJS(M(xi),Td(xi))\mathrm{ZRF} = 1 - \frac{1}{n_f}\sum_{i=1}^{n_f}\mathcal{JS}(M(x_i), T_d(x_i))

Values near 1 indicate the model mimics random guessing on forgotten samples.

  • Task-Specific Metrics (Chen et al., 2023, Zhong et al., 29 Jan 2026):
    • Classification accuracy on test, retain, and forget splits.
    • ROUGE-L F1 on QA sets for forgetting (R-Forget, low desired), utility retention (R-Retain, high desired).
    • MMLU multi-choice accuracy for general capabilities.
    • Masked-LM loss on forget data, membership inference, and attack resilience.
  • Efficiency: Adapter-based and logit-distilled DUET variants are orders of magnitude faster and dramatically more data- and compute-efficient than tuning-based or retrain-from-scratch baselines.

5. Empirical Findings

Extensive experimental studies spanning image classification (Chundawat et al., 2022), summarization and sentiment analysis (Chen et al., 2023), and knowledge-based QA (Zhong et al., 29 Jan 2026) establish that DUET achieves state-of-the-art trade-offs between forgetting (reduction of knowledge on Df\mathcal{D}_f) and retention (maintenance of Dr\mathcal{D}_r utility):

Model / Method Forget Accuracy (↓) Retain Accuracy (↑) Utility (MMLU/ROUGE, ↑) Training Time (s)
Retrain (gold) Near-chance ≈100% Highest High
DUET (adapters) 4‒57% (domain/task) 71–99% Matches gold 2–20× < retrain
Incompetent teacher ~random ≈ baseline Maintains generalization -

On MUSE-Books with Llama-3.2B, DUET achieves RR-Forget = 4.27 (vs base $32.13$), RR-Retain = $78.33$ ($84.29$ base), and MMLU $61.45$ ($61.46$ base), with a joint score improvement Δ55.90\Delta↑ \approx 55.90 over tuning-based and flat baselines. On WMDP-Bio/Cyber, DUET achieves the lowest Acc-Forget and highest MMLU relative to established methods (Zhong et al., 29 Jan 2026).

Sequential fusion (“DUET-fuse”) consistently reduces forgot-set accuracy while preserving test accuracy, outperforming sequential fine-tuning (Chen et al., 2023).

6. Limitations, Robustness, and Open Questions

DUET does not provide formal PAC-style or differential privacy guarantees; empirical effectiveness is certified via surface metrics and attack simulations. Sophisticated jailbreak or reverse engineering attacks maintain some non-trivial success rates (ASR ~ 35% (Zhong et al., 29 Jan 2026)), indicating residual extractable knowledge. Precise boundaries between safe and forbidden knowledge remain under-defined and rely on the quality and specificity of refusal prompts (Zhong et al., 29 Jan 2026). Current evaluation regimes depend on output-level verification; deeper latent-space auditing and membership inference probing are proposed directions. Compute scalability for continuous/delete-all streaming and federated settings is unresolved.

7. Significance and Comparative Analysis

DUET is the first unlearning strategy to unify (a) the efficiency and semantic specificity of prompt/in-context teacher construction and (b) the parameter persistence of fine-tuning or adapter-style descent. It achieves state-of-the-art fogetting–retention trade-offs, with high data efficiency (\sim2k tokens per request), robustness to reverse prompt attacks, and infrastructure for post hoc sequential fusion (Chundawat et al., 2022, Chen et al., 2023, Zhong et al., 29 Jan 2026). A notable implication is that logit-level distillation from efficiently contextualized teachers enables embedding of targeted refusal behavior directly into model parameters, bridging the gap between ephemeral steering and heavyweight retraining.

Further research may explore extension to regression, structured prediction, federated removal, robust adversarial unlearning, and principled certification of unlearning beyond empirical benchmarks.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distilled Unlearning from an Efficient Teacher (DUET).