Diffusion Model Unlearning (DiMUM)
- DiMUM introduces a novel unlearning mechanism that convergently memorizes alternative data to irreversibly eliminate targeted concepts, ensuring resilience against relearning attacks.
- It employs a dual loss strategy—combining retain loss and unlearning-by-memorization loss—to maintain data fidelity while actively severing ties with unlearned targets.
- Empirical evaluations on CIFAR-10 demonstrate significant improvements in generative quality and reduced relearning accuracy compared to prior finetuning-based unlearning techniques.
Diffusion Model Unlearning by Memorization (DiMUM) is a machine unlearning methodology for conditional diffusion models, designed to remove unwanted concepts (data, classes, or features) in a manner that is provably robust against state-of-the-art relearning attacks. Unlike prior finetuning-based unlearning techniques, which exhibit vulnerability to model “relearning” under adversarial retraining, DiMUM achieves irreversible forgetting by explicitly forcing the model to memorize alternative data in place of the unlearning target. This approach represents a shift from negatively reinforcing the unwanted concepts to convergently memorizing their replacements, thereby disrupting reconstructability and generative fidelity toward the original unlearning targets (Yuan et al., 3 Dec 2025).
1. Motivation and Threat Model
Unlearning in generative models addresses privacy, copyright, and safety by ensuring specific data cannot be regenerated after user or regulatory deletion requests. Conventional methods for unlearning in diffusion models typically use either gradient ascent to penalize reconstruction of unwanted data or “swap-label” regularization, but both are readily susceptible to attack. The Diffusion Model Relearning Attack (DiMRA) demonstrates that these models can be fine-tuned using auxiliary datasets to restore the ability to generate the previously forgotten concepts—even when the attacker does not know the exact targets—because the model parameters after unlearning remain close to the pre-unlearning parameters (Yuan et al., 3 Dec 2025).
DiMUM is formulated for conditional diffusion models where a pre-trained model with parameters is partitioned with a retain set (to be preserved) and an unlearning set (to be forgotten). The adversary is assumed to have white-box access to the unlearned model , knowledge of the conditioning space, and, under stronger or weaker assumptions, access to either the true retained data or an auxiliary distribution.
2. DiMUM Methodology
DiMUM achieves unlearning by convergently memorizing alternative data in place of the unlearning targets. The method is implemented via a finetuning procedure with distinct construction of data splits and loss terms:
- Data Partitioning
- Unlearning set : image/feature–condition pairs to be forgotten.
- Retain set : pairs to preserve.
- Alternative Pair Construction
- Memorization set : for every in , pair with a different image sampled from , thus preventing the model from associating with its original (unlearning target).
- Loss Functions
- Retain loss:
This term keeps the model faithful to the benign data. - Unlearning-by-memorization loss:
This guides the model to associate each unlearning prompt not with but instead with random images from . - The total loss:
with controlling the quality vs. unlearning trade-off.
Optimization
- Standard Adam/SGD is used over a set number of unlearning steps, converging to a new local minima that is both faithful on and convergently breaks association on (Yuan et al., 3 Dec 2025).
3. Algorithmic Description and Practical Implementation
Pseudo-code summary (specialized for DiMUM):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
for each training iteration: # Retain loss Sample batch (x_r, c_r) from D_r Compute perturbed input x_t Compute retain loss: L_r # Unlearning-by-memorization loss Sample batch of (x_r, c_u) from D_u' (x_r ≠ paired x_u) Compute perturbed input x_t' Compute unlearning loss: L_u # Total loss and parameter update L = beta * L_r + L_u theta = theta - eta * grad(L) |
Proper construction of (ensuring diversity and no overlap with original ) is critical for both unlearning efficacy and to avoid residual memorization.
Hyperparameter tuning: preserves FID within on and offers control over post-unlearning robustness. Using as many unlearning steps as standard unlearning recipes (e.g., 1–2K for large-scale; up to 20K for smaller models) is necessary for full effect.
AR_DiMRA (Accuracy Rate after DiMRA) is a key metric for measuring resistance to relearning: lower is better; DiMUM regularly achieves on CIFAR-10 after 40K steps (Yuan et al., 3 Dec 2025).
4. Comparison with Prior Machine Unlearning Techniques
DiMUM’s primary innovation is its convergent update mechanism, standing in contrast to:
- Gradient ascent on the unlearning set (divergent, leaves parameters close to pre-unlearning state),
- Swap-label regularization (typically non-convergent),
- General-purpose unlearning by importance sampling (see SISS in (Alberti et al., 2 Mar 2025)) and inference- or training-time magnitude minimization (Wen et al., 31 Jul 2024).
These earlier approaches are fundamentally vulnerable to DiMRA, which exploits proximity to the pre-trained model to “relearn” the previously forgotten targets by additional standard finetuning on an auxiliary dataset, even without explicit knowledge of . DiMUM, by convergently optimizing away from any memorization of —and toward for random —breaks this vulnerability and achieves stable, irreversible unlearning (Yuan et al., 3 Dec 2025).
5. Empirical Evaluation
a) CIFAR-10 Object Unlearning
- One class is unlearned per experiment. Post-unlearning classifier accuracy on the removed class (AR_MU) is driven to for all methods, but after DiMRA, only DiMUM achieves low AR_DiMRA ( at 10K steps; at 40K), versus $0.19-1.0$ for baselines.
- Generation quality (FID) is preserved best by DiMUM: e.g., FID after 20K unlearning steps versus 16 for baselines (Yuan et al., 3 Dec 2025).
b) Feature/style unlearning (UnlearnCanvas)
- Unlearning “Van Gogh” style: DiMUM achieves lowest AR_DiMRA ($0.02$ at 2K steps), indicating rare recovery of “Van Gogh” style after a simulated DiMRA attack, and the highest AR_CL (alternative style convergence accuracy). FID remains in the $42.7-43.1$ range, comparable to best baselines.
c) Sensitivity and Ablations
- Number of unlearning steps, balance, and size all affect tradeoffs. More unlearning steps linearly decrease AR_DiMRA. Sufficiently large and diverse is required for efficient convergence. Higher slows unlearning but favors FID retention.
| Method | Unlearning Steps | FID ↓ | AR_MU ↓ | AR_DiMRA ↓ |
|---|---|---|---|---|
| Salun | 20K | ~16.0 | 0% | 0.19–0.81 |
| Sfront | 2K | 100 | 0% | 0.97–1.00 |
| DiMUM | 20K | ~11.5 | 0% | 0.03–0.25 |
Table: FID and AR (Accuracy Rate) metrics on CIFAR-10, reconstructed from results in (Yuan et al., 3 Dec 2025).
6. Robustness and Practical Considerations
DiMUM is architecture-agnostic and compatible with any noise-predicting conditional diffusion backbone (U-Net, Transformer, etc.) without modification. Its convergent quadratic loss structure ensures the updated model is not susceptible to model drift under further training or adversarial attacks such as DiMRA, which can easily defeat prior art.
Constructing with maximal diversity and no residual correlation with or is necessary to avoid leakage of the forgotten data. Monitoring both FID/sFID (generation quality) and AR_CL (alternative convergence) during training is recommended.
Computationally, DiMUM adds only the overhead of constructing , which is linear in , and typical unlearning schedules (number of steps) are similar to other finetuning-based MU methods.
7. Limitations and Recommendations
DiMUM's efficacy is contingent on the assumption that samples appropriately disrupt associations to . Insufficiently diverse or failure to fully randomize for each may result in incomplete unlearning. The approach does not guarantee elimination of more subtle model behaviors, such as partial style transfer, unless the “memorization” phase sufficiently saturates the parameter space associated with the targeted unlearning features.
Recommended deployment includes:
- Careful tuning of on held-out sets.
- Verification via simulated DiMRA before release.
- Routine monitoring of generative quality metrics and attack robustness.
In summary, DiMUM is a convergent, scalable, and theoretically robust methodology for machine unlearning in diffusion models that achieves irreversible forgetting by memorization, outperforming all existing finetuning-based methods on both generative quality and resistance to model relearning attacks (Yuan et al., 3 Dec 2025).