Papers
Topics
Authors
Recent
2000 character limit reached

Diffusion Model Unlearning (DiMUM)

Updated 10 December 2025
  • DiMUM introduces a novel unlearning mechanism that convergently memorizes alternative data to irreversibly eliminate targeted concepts, ensuring resilience against relearning attacks.
  • It employs a dual loss strategy—combining retain loss and unlearning-by-memorization loss—to maintain data fidelity while actively severing ties with unlearned targets.
  • Empirical evaluations on CIFAR-10 demonstrate significant improvements in generative quality and reduced relearning accuracy compared to prior finetuning-based unlearning techniques.

Diffusion Model Unlearning by Memorization (DiMUM) is a machine unlearning methodology for conditional diffusion models, designed to remove unwanted concepts (data, classes, or features) in a manner that is provably robust against state-of-the-art relearning attacks. Unlike prior finetuning-based unlearning techniques, which exhibit vulnerability to model “relearning” under adversarial retraining, DiMUM achieves irreversible forgetting by explicitly forcing the model to memorize alternative data in place of the unlearning target. This approach represents a shift from negatively reinforcing the unwanted concepts to convergently memorizing their replacements, thereby disrupting reconstructability and generative fidelity toward the original unlearning targets (Yuan et al., 3 Dec 2025).

1. Motivation and Threat Model

Unlearning in generative models addresses privacy, copyright, and safety by ensuring specific data cannot be regenerated after user or regulatory deletion requests. Conventional methods for unlearning in diffusion models typically use either gradient ascent to penalize reconstruction of unwanted data or “swap-label” regularization, but both are readily susceptible to attack. The Diffusion Model Relearning Attack (DiMRA) demonstrates that these models can be fine-tuned using auxiliary datasets to restore the ability to generate the previously forgotten concepts—even when the attacker does not know the exact targets—because the model parameters after unlearning remain close to the pre-unlearning parameters (Yuan et al., 3 Dec 2025).

DiMUM is formulated for conditional diffusion models where a pre-trained model with parameters θp\theta_p is partitioned with a retain set DrD_r (to be preserved) and an unlearning set DuD_u (to be forgotten). The adversary is assumed to have white-box access to the unlearned model θu\theta_u, knowledge of the conditioning space, and, under stronger or weaker assumptions, access to either the true retained data or an auxiliary distribution.

2. DiMUM Methodology

DiMUM achieves unlearning by convergently memorizing alternative data in place of the unlearning targets. The method is implemented via a finetuning procedure with distinct construction of data splits and loss terms:

  1. Data Partitioning
    • Unlearning set Du={(xu,cu)}D_u=\{(x_u, c_u)\}: image/feature–condition pairs to be forgotten.
    • Retain set Dr={(xr,cr)}D_r=\{(x_r, c_r)\}: pairs to preserve.
  2. Alternative Pair Construction
    • Memorization set Du={(xr,cu)  xrDr, cuDu}D_u' = \{(x_r, c_u)~|~x_r\in D_r,~c_u\in D_u\}: for every cuc_u in DuD_u, pair with a different image xrx_r sampled from DrD_r, thus preventing the model from associating cuc_u with its original xux_u (unlearning target).
  3. Loss Functions
    • Retain loss:

    Lr(θ)=Et,(xr,cr)Dr,ϵϵϵθ(αˉtxr+1αˉtϵ, t, cr)2L_r(\theta) = \mathbb{E}_{t, (x_r,c_r)\in D_r, \epsilon} \bigl\| \epsilon - \epsilon_\theta(\sqrt{\bar\alpha_t}x_r + \sqrt{1-\bar\alpha_t}\epsilon,~t,~c_r) \bigr\|^2

    This term keeps the model faithful to the benign data. - Unlearning-by-memorization loss:

    Lu(θ)=Et,(xr,cu)Du,ϵϵϵθ(αˉtxr+1αˉtϵ, t, cu)2L_u(\theta) = \mathbb{E}_{t, (x_r,c_u)\in D_u', \epsilon} \bigl\| \epsilon - \epsilon_\theta(\sqrt{\bar\alpha_t}x_r + \sqrt{1-\bar\alpha_t}\epsilon,~t,~c_u) \bigr\|^2

    This guides the model to associate each unlearning prompt cuc_u not with xux_u but instead with random images xrx_r from DrD_r. - The total loss:

    LDiMUM(θ)=βLr(θ)+Lu(θ)L_{\mathrm{DiMUM}}(\theta) = \beta L_r(\theta) + L_u(\theta)

    with β>0\beta > 0 controlling the quality vs. unlearning trade-off.

  4. Optimization

    • Standard Adam/SGD is used over a set number of unlearning steps, converging to a new local minima that is both faithful on DrD_r and convergently breaks association on DuD_u (Yuan et al., 3 Dec 2025).

3. Algorithmic Description and Practical Implementation

Pseudo-code summary (specialized for DiMUM):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for each training iteration:
    # Retain loss
    Sample batch (x_r, c_r) from D_r
    Compute perturbed input x_t
    Compute retain loss: L_r

    # Unlearning-by-memorization loss
    Sample batch of (x_r, c_u) from D_u' (x_r ≠ paired x_u)
    Compute perturbed input x_t'
    Compute unlearning loss: L_u

    # Total loss and parameter update
    L = beta * L_r + L_u
    theta = theta - eta * grad(L)

Proper construction of DuD_u' (ensuring diversity and no overlap with original xux_u) is critical for both unlearning efficacy and to avoid residual memorization.

Hyperparameter tuning: β[0.5,2.0]\beta \in [0.5, 2.0] preserves FID within ±5%\pm 5\% on DrD_r and offers control over post-unlearning robustness. Using as many unlearning steps as standard unlearning recipes (e.g., 1–2K for large-scale; up to 20K for smaller models) is necessary for full effect.

AR_DiMRA (Accuracy Rate after DiMRA) is a key metric for measuring resistance to relearning: lower is better; DiMUM regularly achieves <0.06<0.06 on CIFAR-10 after 40K steps (Yuan et al., 3 Dec 2025).

4. Comparison with Prior Machine Unlearning Techniques

DiMUM’s primary innovation is its convergent update mechanism, standing in contrast to:

  • Gradient ascent on the unlearning set (divergent, leaves parameters close to pre-unlearning state),
  • Swap-label regularization (typically non-convergent),
  • General-purpose unlearning by importance sampling (see SISS in (Alberti et al., 2 Mar 2025)) and inference- or training-time magnitude minimization (Wen et al., 31 Jul 2024).

These earlier approaches are fundamentally vulnerable to DiMRA, which exploits proximity to the pre-trained model to “relearn” the previously forgotten targets by additional standard finetuning on an auxiliary dataset, even without explicit knowledge of DuD_u. DiMUM, by convergently optimizing away from any memorization of (xu,cu)(x_u, c_u)—and toward (xr,cu)(x_r, c_u) for random xrx_r—breaks this vulnerability and achieves stable, irreversible unlearning (Yuan et al., 3 Dec 2025).

5. Empirical Evaluation

a) CIFAR-10 Object Unlearning

  • One class is unlearned per experiment. Post-unlearning classifier accuracy on the removed class (AR_MU) is driven to 0%0\% for all methods, but after DiMRA, only DiMUM achieves low AR_DiMRA (<0.3<0.3 at 10K steps; <0.06<0.06 at 40K), versus $0.19-1.0$ for baselines.
  • Generation quality (FID) is preserved best by DiMUM: e.g., FID 11.5\approx 11.5 after 20K unlearning steps versus >>16 for baselines (Yuan et al., 3 Dec 2025).

b) Feature/style unlearning (UnlearnCanvas)

  • Unlearning “Van Gogh” style: DiMUM achieves lowest AR_DiMRA ($0.02$ at 2K steps), indicating rare recovery of “Van Gogh” style after a simulated DiMRA attack, and the highest AR_CL (alternative style convergence accuracy). FID remains in the $42.7-43.1$ range, comparable to best baselines.

c) Sensitivity and Ablations

  • Number of unlearning steps, β\beta balance, and Du|D_u'| size all affect tradeoffs. More unlearning steps linearly decrease AR_DiMRA. Sufficiently large and diverse DuD_u' is required for efficient convergence. Higher β\beta slows unlearning but favors FID retention.
Method Unlearning Steps FID ↓ AR_MU ↓ AR_DiMRA ↓
Salun 20K ~16.0 0% 0.19–0.81
Sfront 2K >>100 0% 0.97–1.00
DiMUM 20K ~11.5 0% 0.03–0.25

Table: FID and AR (Accuracy Rate) metrics on CIFAR-10, reconstructed from results in (Yuan et al., 3 Dec 2025).

6. Robustness and Practical Considerations

DiMUM is architecture-agnostic and compatible with any noise-predicting conditional diffusion backbone (U-Net, Transformer, etc.) without modification. Its convergent quadratic loss structure ensures the updated model is not susceptible to model drift under further training or adversarial attacks such as DiMRA, which can easily defeat prior art.

Constructing DuD_u' with maximal diversity and no residual correlation with DuD_u or cuc_u is necessary to avoid leakage of the forgotten data. Monitoring both FID/sFID (generation quality) and AR_CL (alternative convergence) during training is recommended.

Computationally, DiMUM adds only the overhead of constructing DuD_u', which is linear in Dr×Du|D_r| \times |D_u|, and typical unlearning schedules (number of steps) are similar to other finetuning-based MU methods.

7. Limitations and Recommendations

DiMUM's efficacy is contingent on the assumption that DuD_u' samples appropriately disrupt associations to DuD_u. Insufficiently diverse DuD_u' or failure to fully randomize xrx_r for each cuc_u may result in incomplete unlearning. The approach does not guarantee elimination of more subtle model behaviors, such as partial style transfer, unless the “memorization” phase sufficiently saturates the parameter space associated with the targeted unlearning features.

Recommended deployment includes:

  • Careful tuning of β\beta on held-out sets.
  • Verification via simulated DiMRA before release.
  • Routine monitoring of generative quality metrics and attack robustness.

In summary, DiMUM is a convergent, scalable, and theoretically robust methodology for machine unlearning in diffusion models that achieves irreversible forgetting by memorization, outperforming all existing finetuning-based methods on both generative quality and resistance to model relearning attacks (Yuan et al., 3 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Diffusion Model Unlearning by Memorization (DiMUM).