Zero-Shot Machine Unlearning

Updated 13 March 2026

The paper introduces zero-shot machine unlearning techniques that efficiently remove targeted data influence using methods like Hessian estimation and adversarial noise without accessing the original training set.
Methodologies span parametric corrections, synthetic adversarial frameworks, and architectural masking to balance robust forgetting with minimal performance degradation.
Empirical evaluations on benchmarks such as CIFAR-10 and ImageNet demonstrate significant computational speedups and close approximations to full retraining in terms of unlearning efficacy.

Zero-shot machine unlearning comprises a class of methodologies designed to efficiently remove the influence of specific data (e.g., individual samples, whole classes, or user identities) from trained models, under the stringent constraint of not requiring access to the original training set or, sometimes, any data at all except the explicit forget set. Such methods are crucial for privacy and regulatory compliance in contemporary machine learning systems, especially where data residency, sensitivity, or operational constraints preclude standard retraining or data replay. Modern approaches span parametric, representation, and architecture-level interventions, provide varying forms of theoretical guarantees, and achieve a spectrum of trade-offs between forgetting, performance retention, and scalability.

1. Formal Problem Settings and Unlearning Guarantees

Zero-shot machine unlearning is defined by the following constraints: starting from a pretrained model $\theta^*$ , access is provided only to the model parameters and to the forget set $D_f$ (or, in some regimes, class/index labels), while the retained set $D_r$ is not available. The unlearning objective is to obtain unlearned parameters $\theta_{uf}$ such that for any test point $x$ in $D_f$ , the prediction fidelity of $f_{\theta_{uf}}(x)$ is minimized (forgetting), while for $x$ in $D_r$ , fidelity is preserved as close as possible to $f_{\theta^*}$ or an ideal retrained model $f_{\theta^r}$ .

Theoretical unlearning guarantees are expressed in terms of parameter-indistinguishability or output-indistinguishability to explicit retrain, explicit bounds on the gradient norm over $D_r$ , or information-theoretic proxies (e.g., reduction in mutual information between the model and $D_f$ ) (Ahmed et al., 20 Aug 2025, Almudévar et al., 29 Jan 2026, Foster et al., 2024). Typical results take the form:

$||\nabla_\theta L(\theta_{uf}; D_r)||_2 \leq O\left(\frac{n_f^2}{n_r}\right)$

demonstrating convergence of $\theta_{uf}$ towards the retrained optimum as the forget fraction $n_f/n$ decreases and the Hessian/proxy estimation error is controlled.

2. Main Methodological Paradigms

2.1 Parametric Correction via Unknown Retain-data Hessian Estimation

A central paradigm is the use of influence function/Newton-style correction, operationalized when $D_r$ is unknown: estimate the Hessian $H_r$ of $L(\theta; D_r)$ using only perturbations and statistics derived from $D_f$ (Ahmed et al., 20 Aug 2025). The methodology involves random Gaussian parameter perturbations, matching second-order Taylor expansions on the loss over $D_f$ to solve for $H_r$ via a semidefinite program:

$\hat H_r = \arg\min_{X \succeq 0} \frac{1}{m}\sum_{i=1}^m \left(\frac{1}{2} \delta\theta_i^T X \delta\theta_i - g_f^T \delta\theta_i - \Delta \ell_f(\theta_i)\right)^2$

After solving for $\hat H_r$ , the unlearning update is:

$\theta_{uf} = \theta^* - \hat H_r^{-1} g_f$

where $g_f = \nabla_\theta L(\theta^*; D_f)$ . This approach yields unlearning with provable fidelity to ideal retrain, provided the Hessian estimation error is controlled (Ahmed et al., 20 Aug 2025).

2.2 Synthetic Adversarial and Noise-based Unlearning

Several frameworks generate adversarial or error-maximizing noise samples that act as "anti-examples" for the forget set; these are used to drive rapid forgetting via targeted parametric updates. Examples include:

Proxy adversarial data generation (ZS-PAG): Adversarial examples are synthesized starting from $D_f$ , targeting alternative classes or maximizing existing misclassification likelihood (Chen et al., 29 Jul 2025). Updates are projected into the orthogonal complement of the proxy retain-data subspace to prevent over-unlearning.
Error-maximizing/minimizing noise: Synthetic inputs are created to maximize loss on $D_f$ and minimize on $D_r$ , followed by an impair/repair optimization on parameters (Tarun et al., 2021, Chundawat et al., 2022).
Synthetic sample generation for CLIP: Gradient ascent in input space produces synthetic images for the forget class; local Lipschitz regularization is enforced on these (Kravets et al., 2024).

2.3 Architectural and Representation-Level Unlearning

Architectural approaches do not modify base model parameters, but intervene on representation bottlenecks or classifier heads:

Discrete key-value bottleneck masking: Codebook entries highly correlated with $D_f$ are identified and masked, instantly and compute-free, excising the forget class at inference (Shah et al., 2023).
Nullspace projection for CLIP: The image embedding projection is replaced by the orthogonal complement to the span of forget-class text embeddings, annihilating linear alignment to the forgotten class (Mishra et al., 16 Dec 2025).

Representation unlearning employs a lightweight transformation $T_\phi$ :

$\min_\phi\; \mathcal{L}_r^{ZS}(\phi) + \beta\, \mathcal{L}_f^{ZS}(\phi)$

where $\mathcal{L}_r^{ZS}$ anchors retained data prototypes (in practice, final layer weights), and $\mathcal{L}_f^{ZS}$ repels forget-set representations from retained prototypes, leveraging the "neural collapse" structure to approximate retained information in a zero-shot setting (Almudévar et al., 29 Jan 2026).

2.4 Neuronal Path and Layerwise-Pruning Methods

Interpretable and data-efficient: Layer-wise relevance propagation (LRP) identifies neurons most responsible for the forget class, which are directly zeroed or pruned, breaking the neuronal path responsible for forgotten outputs with no further training (Chang et al., 2024). This method relies on public or generated data for LRP input, without accessing original training samples.

3. Pseudocode Summaries

A representative pseudocode for Hessian-estimation-based zero-shot unlearning (Ahmed et al., 20 Aug 2025):

g_f = grad_theta L(theta_star; D_f)
for i in range(m):
    delta_theta_i = Normal(0, sigma^2 * I)
    theta_i = theta_star + delta_theta_i
    delta_l_f = L(theta_i; D_f) - L(theta_star; D_f)
    # Accumulate for SDP objective
theta_uf = theta_star - H_hat^{-1} @ g_f

Pseudocode for masking-based class unlearning (Shah et al., 2023):

for x in F:
    h = E(x)
    for c in heads:
        k_star_c = argmin_k ||h_c - e_{c,k}||
        count[c, k_star_c] += 1
M = top N_a (c, k) by count
for (c, k) in M:
    masked[c, k] = True  # or set distance to inf

4. Empirical Benchmarks and Key Findings

Zero-shot unlearning methods are evaluated on standard benchmarks, including CIFAR-10, CIFAR-100, ImageNet derivatives, VGGFace, and CLIP-based retrieval datasets. Across methodologies, core metrics are:

Forget-set accuracy reduction (target near 0%).
Retain-set/test accuracy (target: minimal drop relative to full retrain).
Membership-Inference Attack (MIA) score (target: approach 50% for optimal unlearning).

Notable empirical findings include:

Methodology	Forget Acc ↓	Retain Acc ↓	Speedup	MIA Score	Dataset	Citation
Source-free Hessian (Ahmed et al., 20 Aug 2025)	0–few %	2–5% pts	>100x retrain	≈random	CIFAR-10/100, Caltech-256	(Ahmed et al., 20 Aug 2025)
Key-value masking (Shah et al., 2023)	0%	≤0.5%	~50x distillation	-	CIFAR-10/100, LACUNA-100	(Shah et al., 2023)
Nullspace CLIP (Mishra et al., 16 Dec 2025)	0%	≤7%	>1000x retrain	≥70%	StanfordCars/Dogs, OxfordFlowers	(Mishra et al., 16 Dec 2025)
LRP-pruning (Chang et al., 2024)	≈0%	<3%	>100x retrain	1.0	MNIST/CIFAR-10/100	(Chang et al., 2024)
ZS-PAG subspace (Chen et al., 29 Jul 2025)	<2%	≤2%	~10x retrain	≈retrain	CIFAR-10/100, Facescrub	(Chen et al., 29 Jul 2025)
Codebook LLM (Wu et al., 2024)	>>random	0–30% drop	Instantaneous	-	T5-small (opus_books)	(Wu et al., 2024)

All methods confirm dramatic improvements over baseline zero-shot competitors (negative gradient ascent, random relabeling, fine-tuning, Amnesiac) and match or closely approximate full retraining in forget efficacy, with significant computational gains (Ahmed et al., 20 Aug 2025, Chang et al., 2024, Mishra et al., 16 Dec 2025, Shah et al., 2023, Chen et al., 29 Jul 2025, Wu et al., 2024).

5. Theoretical and Practical Limitations

Current zero-shot unlearning methods provide varying levels of formal guarantees:

Parametric correction (Ahmed et al., 20 Aug 2025, Chen et al., 29 Jul 2025) and representation unlearning (Almudévar et al., 29 Jan 2026) achieve provable upper bounds on unlearning error/gradient norm, predicated on loss Lipschitzness, reliable Hessian/proxy estimation, or neural collapse hypotheses.
Bottleneck masking and nullspace projection approaches (Shah et al., 2023, Mishra et al., 16 Dec 2025) guarantee removal of information along masked/projected subspaces, but do not remove residual nonlinear or entangled information upstream.
LRP/pruning approaches (Chang et al., 2024) make no formal privacy proof but empirically erase target-output paths.

Key limitations include:

Difficulty in extending to arbitrary subsets in highly entangled feature spaces, where representation overlap between classes degrades selective masking efficiency (Shah et al., 2023, Mishra et al., 16 Dec 2025).
Potential non-optimality for example-level unlearning since most methods assume class-level partition structure.
Sensitivity to model redundancy: underparameterized or low-capacity backbones tend to incur higher collateral utility losses (Chang et al., 2024).
For methods using pseudo-labeling or adversarial proxy generation, the utility of the proxy set depends on the model's local decision boundary geometry; very flat or highly curved boundaries may degrade performance (Chen et al., 29 Jul 2025).
Regularizer and projection hyperparameter tuning remains empirically driven in the absence of direct retained-data metrics.

6. Extensions, Modalities, and Open Directions

Zero-shot unlearning methods are being extended to new modalities and operational regimes:

Personalized/federated unlearning with cryptographic verifiability (ZK-APEX): deterministic sparse-masking plus compensation, tractable for on-device edge settings, and amenable to zero-knowledge proofs (Maheri et al., 9 Dec 2025).
Training-free activation steering in TTS: dynamic intervention on internal hidden states enforces opt-out of seen and unseen speaker identities without retraining (Lee et al., 28 Jan 2026).
Few-shot zero-glance settings: generative feedback networks synthesize "Optimal Erasure Samples" for class forgetting using minimal retained data (Song et al., 17 Nov 2025).
LLMs: sparse autoencoder bottlenecks and codebook masking remove topic-specific information in texts (Wu et al., 2024).

Open challenges include formalizing unlearning for arbitrary sample subsets, developing certified privacy guarantees (e.g., $\epsilon$ -differential privacy analogs for unlearning), and robustly managing information deletion in highly entangled or continually learned model spaces (Shah et al., 2023, Almudévar et al., 29 Jan 2026, Maheri et al., 9 Dec 2025).

7. Summary Table of Selected Zero-Shot Unlearning Algorithms

Approach	Core Mechanism	Data Requirement	Main Guarantee or Limitation	Exemplary Paper
Hessian Estimation + Influence	Surrogate $H_r$ estimation	$D_f$ , $\theta^*$	Grad. norm bound, close to retrain	(Ahmed et al., 20 Aug 2025)
Discrete Masking (DKVB/VQ)	Architectural masking	$D_f$	Inference-only, minimal compute, class-level	(Shah et al., 2023)
Nullspace Projection (CLIP)	Linear projection in head	Forget-class text embeds	Exact class line removal, linear only	(Mishra et al., 16 Dec 2025)
LRP Neuronal Pruning	Layer-wise relevance/pruning	Synthetic/public $D_f$	Empirical validity, fast, interpretable	(Chang et al., 2024)
Adversarial Proxy + Subspace	Synthetic proxy batch + proj.	$D_f$ , $\theta^*$	Provable remain-set utility retention	(Chen et al., 29 Jul 2025)
Representation Bottleneck	SAE, codebook mask	Task/contrast data	KL divergence change on target topic	(Wu et al., 2024)
Activation Steering (TTS)	Hidden vector projection	Reference utterances	Training-free, dynamic opt-out	(Lee et al., 28 Jan 2026)