RMU Unlearning Methods

Updated 23 January 2026

RMU Unlearning is a framework for removing specific data influence from trained models, ensuring the unlearned model mimics one retrained from scratch.
The methodology employs adversarial fine-tuning with techniques like FastClip for Lipschitz regularization to maintain accuracy on both remain and forget sets.
Empirical results on benchmarks (e.g., CIFAR-10, MNIST) demonstrate that RMU achieves near-identical performance and resists membership-inference attacks.

Reliable Machine Unlearning (RMU) is a set of algorithmic and theoretical methodologies for post hoc removal of the influence of specified data—random samples or entire classes—from trained machine learning models, with the goal of making the unlearned model closely mimic the behavior of an equivalent model retrained from scratch on the remaining data. RMU aims to provide strong empirical and, when possible, formal guarantees of indistinguishability in predictions, privacy, and resistance to inference attacks between retrained and unlearned models, across both remain and forget examples. Recent research has advanced RMU algorithms for classification settings, with extensions and analysis for distributed, adversarially robust, and membership-inference-resistant unlearning (Ebrahimpour-Boroojeny, 7 Dec 2025).

1. Formal Objective and Reliability Criteria

Let $D = \{(x_i, y_i)\}_{i=1}^N$ be a labeled training set, $\mathcal{Y} = \{1,\dots,m\}$ the label set, $D_F \subset D$ the forget set, and $D_R = D \setminus D_F$ the remain set. Let $F_\theta: \mathcal{X} \to \Delta^{m-1}$ denote the classifier parameterized by $\theta$ , and $\theta_R$ the parameters obtained by retraining from scratch on $D_R$ . The core criterion for RMU is that an unlearning map $\mathcal{M}(D, D_F, \theta) \mapsto \theta_u$ yields a model $F_{\theta_u}$ satisfying:

$|\text{ACC}_r(\theta_u) - \text{ACC}_r(\theta_R)| \approx 0$ (retain-class accuracy),
$|\text{ACC}_f(\theta_u) - \text{ACC}_f(\theta_R)| \approx 0$ (forget-class accuracy),
$\text{MIA}(\theta_u) \approx \text{MIA}(\theta_R)$ (membership-inference attack indistinguishability).

These conditions are evaluated on remain and forget sets, with membership inference attack (MIA) metrics measuring the adversary’s ability to distinguish forgotten training data from held-out data (Ebrahimpour-Boroojeny, 7 Dec 2025).

2. Adversarial Machine UNlearning (AMUN): Core Algorithm

AMUN is a foundational RMU algorithm that lowers confidence on forget samples by fine-tuning the model on adversarial examples specifically crafted for each sample in $D_F$ . The adversarial procedure operates as follows:

Adversarial Set Construction: For each forget sample $(x, y)$ , generate $x_{\text{adv}} = x+\delta$ with $\|\delta\|\leq\varepsilon$ , using an untargeted attack $\mathcal{A}$ , such that $F_\theta(x_{\text{adv}}) \neq y$ . The smallest $\varepsilon$ ensuring this misclassification is preferred to localize the decision boundary modification.
Fine-Tuning Objective: The updated model $\theta_u$ is obtained by minimizing:

$\sum_{(x, y) \in D_R} \ell\big(F_\theta(x), y\big) + \eta \sum_{(x_{\text{adv}}, y') \in D_{\text{adv}}} \ell\big(F_\theta(x_{\text{adv}}), y'\big)$

where $D_{\text{adv}}$ contains all crafted adversarial examples corresponding to the forget set.

Behavioral Similarity Evaluation: RMU procedures explicitly compare the unlearned model to the scratch-retrained model by the maximum absolute difference in softmax outputs ( $L_\infty$ -distance), as well as accuracy and MIA scores, requiring minimal deviation on both remain and forget sets (Ebrahimpour-Boroojeny, 7 Dec 2025).

3. Smoothness Control: FastClip for Lipschitz Regularization

AMUN exploits model smoothness to improve unlearning fidelity. FastClip, a component of the RMU framework, projects affine layers to have bounded spectral norms, ensuring global Lipschitz control. For each affine layer $x \mapsto Mx + b$ , the spectral norm $\|M\|_2$ is constrained (typically to 1), yielding:

$\text{Lip}(F) \leq \prod_{\ell}\|M^{(\ell)}\|_2$

Spectral norms are computed via PowerQR, i.e., adapted power iteration with subspace orthogonalization. If any $\sigma_1 > C$ , gradients are used to shrink $M$ appropriately, and SGD steps are interleaved with projection. This enforcement tightens adversarial transfer bounds and enhances unlearning by making the local decision landscape match that of retraining more closely (Ebrahimpour-Boroojeny, 7 Dec 2025).

4. Class Unlearning and Tilted ReWeighting (TRW)

For the regime where all samples of a class $y_f$ are to be forgotten, AMUN alone is insufficient, as prior methods naively zero the probability of $y_f$ but fail to mimic the retrained model's typical misclassifications. This allows special MIA-NN (nearest-neighbor) attacks, which exploit the systematic routing of forgotten examples to particular “neighboring” classes by retrained models.

RMU remedies this via Tilted ReWeighting (TRW):

Compute the model’s softmax $p(y|x)$ and remove $y_f$ to form $\tilde{p}(y|x) = \frac{p(y|x)}{1-p(y_f|x)}, y \neq y_f$ .
For each $y \neq y_f$ assign a similarity score $s_y$ (e.g., logit-weight cosine similarity with $y_f$ ).
Define the “tilted” target:

$q^*(y|x) = \frac{\tilde{p}(y|x) \exp(\beta s_y)}{\sum_{j\neq y_f} \tilde{p}(j|x) \exp(\beta s_j)}, \quad q^*(y_f|x)=0$

where $\beta$ tunes the similarity weighting.

Fine-tune the model on the forget set to move the prediction toward $q^*$ via cross-entropy (KL) loss. This ensures that the unlearned model’s pattern of misclassification on $D_F$ mirrors the retrained model’s, defeating nearest-neighbor attacks (Ebrahimpour-Boroojeny, 7 Dec 2025).

5. Theoretical Guarantees and Analysis

Theoretical analysis reveals that AMUN’s ability to approximate the retrained model’s parameters depends on the model's smoothness (β-smooth), the Lipschitz constant $L$ , and the effectiveness of adversarial transfer. For a single fine-tuning gradient step with adversarial example $x+\delta$ , the parameter deviation from the retrained model can be bounded:

$\|\theta' - \theta_R\|^2 \leq \|\theta_0 - \theta_R\|^2 + \frac{2}{\beta}\left[L\varepsilon - \Big(\ell(f_{\theta_0}(x+\delta),y) + \ell(f_{\theta'}(x+\delta),y') - \ell(f_{\theta_R}(x),y) - \ell(f_{\theta_R}(x+\delta),y')\right) \Big]$

Smaller adversarial perturbations ( $\varepsilon$ ), higher smoothness ( $\beta$ ), and better adversarial transfer result in tighter closeness to retraining (Ebrahimpour-Boroojeny, 7 Dec 2025).

TRW's information-projection solution is shown to be unique by convexity of KL divergence under affine constraints.

6. Empirical Results and Practical Significance

Empirical studies on standard vision benchmarks (MNIST, CIFAR-10/100, Tiny-ImageNet-200) with deep architectures (ResNet-18, VGG-19, DLA) demonstrate that AMUN and TRW:

Achieve indistinguishable retain and forget accuracy compared to true retrain.
Reduce MIA and MIA-NN attack accuracy to near-random levels (e.g., on CIFAR-10, ResNet-18: $\text{ACC}_r \approx 93.5\%$ , $\text{ACC}_f \approx 0\%$ , MIA-AUC $\approx 50\%$ ).
Successfully replicate retrained models’ softmax outputs ( $\Delta_\infty$ metric—mean maximum absolute difference—close to zero) (Ebrahimpour-Boroojeny, 7 Dec 2025).

Notably, FastClip adds <5% computational overhead and improves adversarial transfer, while combining AMUN with other data-free unlearning methods (e.g., SalUn) increases privacy in the no-retain-set regime.

7. Limitations, Open Problems, and Future Directions

Current analysis and guarantees are primarily empirical; a complete theoretical understanding of adversarial transfer in non-convex settings remains open.
TRW currently uses global similarity scores; sample-dependent similarities and higher-order moment constraints may better approximate retrain behavior.
Extending RMU to generative settings (GANs, diffusion models, LLMs) and concept-unlearning applications is an open research question.
Designing certified, privacy-guaranteed, post hoc unlearning protocols (e.g., DP-style bounds for after-the-fact deletion) is an active line of research (Ebrahimpour-Boroojeny, 7 Dec 2025).

In summary, Reliable Machine Unlearning seeks indistinguishability from retraining for both forget and remain data. The RMU framework leverages adversarial fine-tuning, model smoothness enforcement, and class-similarity redistribution to achieve strong empirical fidelity to retraining, mitigate membership-inference risk, and increase practical usability for privacy and compliance applications.

Markdown Upgrade to Chat

References (1)

Toward Reliable Machine Unlearning: Theory, Algorithms, and Evaluation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RMU Unlearning.