Papers
Topics
Authors
Recent
Search
2000 character limit reached

RMU Unlearning Methods

Updated 23 January 2026
  • RMU Unlearning is a framework for removing specific data influence from trained models, ensuring the unlearned model mimics one retrained from scratch.
  • The methodology employs adversarial fine-tuning with techniques like FastClip for Lipschitz regularization to maintain accuracy on both remain and forget sets.
  • Empirical results on benchmarks (e.g., CIFAR-10, MNIST) demonstrate that RMU achieves near-identical performance and resists membership-inference attacks.

Reliable Machine Unlearning (RMU) is a set of algorithmic and theoretical methodologies for post hoc removal of the influence of specified data—random samples or entire classes—from trained machine learning models, with the goal of making the unlearned model closely mimic the behavior of an equivalent model retrained from scratch on the remaining data. RMU aims to provide strong empirical and, when possible, formal guarantees of indistinguishability in predictions, privacy, and resistance to inference attacks between retrained and unlearned models, across both remain and forget examples. Recent research has advanced RMU algorithms for classification settings, with extensions and analysis for distributed, adversarially robust, and membership-inference-resistant unlearning (Ebrahimpour-Boroojeny, 7 Dec 2025).

1. Formal Objective and Reliability Criteria

Let D={(xi,yi)}i=1ND = \{(x_i, y_i)\}_{i=1}^N be a labeled training set, Y={1,,m}\mathcal{Y} = \{1,\dots,m\} the label set, DFDD_F \subset D the forget set, and DR=DDFD_R = D \setminus D_F the remain set. Let Fθ:XΔm1F_\theta: \mathcal{X} \to \Delta^{m-1} denote the classifier parameterized by θ\theta, and θR\theta_R the parameters obtained by retraining from scratch on DRD_R. The core criterion for RMU is that an unlearning map M(D,DF,θ)θu\mathcal{M}(D, D_F, \theta) \mapsto \theta_u yields a model FθuF_{\theta_u} satisfying:

  • ACCr(θu)ACCr(θR)0|\text{ACC}_r(\theta_u) - \text{ACC}_r(\theta_R)| \approx 0 (retain-class accuracy),
  • ACCf(θu)ACCf(θR)0|\text{ACC}_f(\theta_u) - \text{ACC}_f(\theta_R)| \approx 0 (forget-class accuracy),
  • MIA(θu)MIA(θR)\text{MIA}(\theta_u) \approx \text{MIA}(\theta_R) (membership-inference attack indistinguishability).

These conditions are evaluated on remain and forget sets, with membership inference attack (MIA) metrics measuring the adversary’s ability to distinguish forgotten training data from held-out data (Ebrahimpour-Boroojeny, 7 Dec 2025).

2. Adversarial Machine UNlearning (AMUN): Core Algorithm

AMUN is a foundational RMU algorithm that lowers confidence on forget samples by fine-tuning the model on adversarial examples specifically crafted for each sample in DFD_F. The adversarial procedure operates as follows:

  1. Adversarial Set Construction: For each forget sample (x,y)(x, y), generate xadv=x+δx_{\text{adv}} = x+\delta with δε\|\delta\|\leq\varepsilon, using an untargeted attack A\mathcal{A}, such that Fθ(xadv)yF_\theta(x_{\text{adv}}) \neq y. The smallest ε\varepsilon ensuring this misclassification is preferred to localize the decision boundary modification.
  2. Fine-Tuning Objective: The updated model θu\theta_u is obtained by minimizing:

(x,y)DR(Fθ(x),y)+η(xadv,y)Dadv(Fθ(xadv),y)\sum_{(x, y) \in D_R} \ell\big(F_\theta(x), y\big) + \eta \sum_{(x_{\text{adv}}, y') \in D_{\text{adv}}} \ell\big(F_\theta(x_{\text{adv}}), y'\big)

where DadvD_{\text{adv}} contains all crafted adversarial examples corresponding to the forget set.

  1. Behavioral Similarity Evaluation: RMU procedures explicitly compare the unlearned model to the scratch-retrained model by the maximum absolute difference in softmax outputs (LL_\infty-distance), as well as accuracy and MIA scores, requiring minimal deviation on both remain and forget sets (Ebrahimpour-Boroojeny, 7 Dec 2025).

3. Smoothness Control: FastClip for Lipschitz Regularization

AMUN exploits model smoothness to improve unlearning fidelity. FastClip, a component of the RMU framework, projects affine layers to have bounded spectral norms, ensuring global Lipschitz control. For each affine layer xMx+bx \mapsto Mx + b, the spectral norm M2\|M\|_2 is constrained (typically to 1), yielding:

Lip(F)M()2\text{Lip}(F) \leq \prod_{\ell}\|M^{(\ell)}\|_2

Spectral norms are computed via PowerQR, i.e., adapted power iteration with subspace orthogonalization. If any σ1>C\sigma_1 > C, gradients are used to shrink MM appropriately, and SGD steps are interleaved with projection. This enforcement tightens adversarial transfer bounds and enhances unlearning by making the local decision landscape match that of retraining more closely (Ebrahimpour-Boroojeny, 7 Dec 2025).

4. Class Unlearning and Tilted ReWeighting (TRW)

For the regime where all samples of a class yfy_f are to be forgotten, AMUN alone is insufficient, as prior methods naively zero the probability of yfy_f but fail to mimic the retrained model's typical misclassifications. This allows special MIA-NN (nearest-neighbor) attacks, which exploit the systematic routing of forgotten examples to particular “neighboring” classes by retrained models.

RMU remedies this via Tilted ReWeighting (TRW):

  1. Compute the model’s softmax p(yx)p(y|x) and remove yfy_f to form p~(yx)=p(yx)1p(yfx),yyf\tilde{p}(y|x) = \frac{p(y|x)}{1-p(y_f|x)}, y \neq y_f.
  2. For each yyfy \neq y_f assign a similarity score sys_y (e.g., logit-weight cosine similarity with yfy_f).
  3. Define the “tilted” target:

q(yx)=p~(yx)exp(βsy)jyfp~(jx)exp(βsj),q(yfx)=0q^*(y|x) = \frac{\tilde{p}(y|x) \exp(\beta s_y)}{\sum_{j\neq y_f} \tilde{p}(j|x) \exp(\beta s_j)}, \quad q^*(y_f|x)=0

where β\beta tunes the similarity weighting.

  1. Fine-tune the model on the forget set to move the prediction toward qq^* via cross-entropy (KL) loss. This ensures that the unlearned model’s pattern of misclassification on DFD_F mirrors the retrained model’s, defeating nearest-neighbor attacks (Ebrahimpour-Boroojeny, 7 Dec 2025).

5. Theoretical Guarantees and Analysis

Theoretical analysis reveals that AMUN’s ability to approximate the retrained model’s parameters depends on the model's smoothness (β-smooth), the Lipschitz constant LL, and the effectiveness of adversarial transfer. For a single fine-tuning gradient step with adversarial example x+δx+\delta, the parameter deviation from the retrained model can be bounded:

θθR2θ0θR2+2β[Lε((fθ0(x+δ),y)+(fθ(x+δ),y)(fθR(x),y)(fθR(x+δ),y))]\|\theta' - \theta_R\|^2 \leq \|\theta_0 - \theta_R\|^2 + \frac{2}{\beta}\left[L\varepsilon - \Big(\ell(f_{\theta_0}(x+\delta),y) + \ell(f_{\theta'}(x+\delta),y') - \ell(f_{\theta_R}(x),y) - \ell(f_{\theta_R}(x+\delta),y')\right) \Big]

Smaller adversarial perturbations (ε\varepsilon), higher smoothness (β\beta), and better adversarial transfer result in tighter closeness to retraining (Ebrahimpour-Boroojeny, 7 Dec 2025).

TRW's information-projection solution is shown to be unique by convexity of KL divergence under affine constraints.

6. Empirical Results and Practical Significance

Empirical studies on standard vision benchmarks (MNIST, CIFAR-10/100, Tiny-ImageNet-200) with deep architectures (ResNet-18, VGG-19, DLA) demonstrate that AMUN and TRW:

  • Achieve indistinguishable retain and forget accuracy compared to true retrain.
  • Reduce MIA and MIA-NN attack accuracy to near-random levels (e.g., on CIFAR-10, ResNet-18: ACCr93.5%\text{ACC}_r \approx 93.5\%, ACCf0%\text{ACC}_f \approx 0\%, MIA-AUC 50%\approx 50\%).
  • Successfully replicate retrained models’ softmax outputs (Δ\Delta_\infty metric—mean maximum absolute difference—close to zero) (Ebrahimpour-Boroojeny, 7 Dec 2025).

Notably, FastClip adds <5% computational overhead and improves adversarial transfer, while combining AMUN with other data-free unlearning methods (e.g., SalUn) increases privacy in the no-retain-set regime.

7. Limitations, Open Problems, and Future Directions

  • Current analysis and guarantees are primarily empirical; a complete theoretical understanding of adversarial transfer in non-convex settings remains open.
  • TRW currently uses global similarity scores; sample-dependent similarities and higher-order moment constraints may better approximate retrain behavior.
  • Extending RMU to generative settings (GANs, diffusion models, LLMs) and concept-unlearning applications is an open research question.
  • Designing certified, privacy-guaranteed, post hoc unlearning protocols (e.g., DP-style bounds for after-the-fact deletion) is an active line of research (Ebrahimpour-Boroojeny, 7 Dec 2025).

In summary, Reliable Machine Unlearning seeks indistinguishability from retraining for both forget and remain data. The RMU framework leverages adversarial fine-tuning, model smoothness enforcement, and class-similarity redistribution to achieve strong empirical fidelity to retraining, mitigate membership-inference risk, and increase practical usability for privacy and compliance applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RMU Unlearning.