Papers
Topics
Authors
Recent
Search
2000 character limit reached

Zero-Shot Machine Unlearning

Updated 13 March 2026
  • The paper introduces zero-shot machine unlearning techniques that efficiently remove targeted data influence using methods like Hessian estimation and adversarial noise without accessing the original training set.
  • Methodologies span parametric corrections, synthetic adversarial frameworks, and architectural masking to balance robust forgetting with minimal performance degradation.
  • Empirical evaluations on benchmarks such as CIFAR-10 and ImageNet demonstrate significant computational speedups and close approximations to full retraining in terms of unlearning efficacy.

Zero-shot machine unlearning comprises a class of methodologies designed to efficiently remove the influence of specific data (e.g., individual samples, whole classes, or user identities) from trained models, under the stringent constraint of not requiring access to the original training set or, sometimes, any data at all except the explicit forget set. Such methods are crucial for privacy and regulatory compliance in contemporary machine learning systems, especially where data residency, sensitivity, or operational constraints preclude standard retraining or data replay. Modern approaches span parametric, representation, and architecture-level interventions, provide varying forms of theoretical guarantees, and achieve a spectrum of trade-offs between forgetting, performance retention, and scalability.

1. Formal Problem Settings and Unlearning Guarantees

Zero-shot machine unlearning is defined by the following constraints: starting from a pretrained model θ\theta^*, access is provided only to the model parameters and to the forget set DfD_f (or, in some regimes, class/index labels), while the retained set DrD_r is not available. The unlearning objective is to obtain unlearned parameters θuf\theta_{uf} such that for any test point xx in DfD_f, the prediction fidelity of fθuf(x)f_{\theta_{uf}}(x) is minimized (forgetting), while for xx in DrD_r, fidelity is preserved as close as possible to fθf_{\theta^*} or an ideal retrained model fθrf_{\theta^r}.

Theoretical unlearning guarantees are expressed in terms of parameter-indistinguishability or output-indistinguishability to explicit retrain, explicit bounds on the gradient norm over DrD_r, or information-theoretic proxies (e.g., reduction in mutual information between the model and DfD_f) (Ahmed et al., 20 Aug 2025, Almudévar et al., 29 Jan 2026, Foster et al., 2024). Typical results take the form:

θL(θuf;Dr)2O(nf2nr)||\nabla_\theta L(\theta_{uf}; D_r)||_2 \leq O\left(\frac{n_f^2}{n_r}\right)

demonstrating convergence of θuf\theta_{uf} towards the retrained optimum as the forget fraction nf/nn_f/n decreases and the Hessian/proxy estimation error is controlled.

2. Main Methodological Paradigms

2.1 Parametric Correction via Unknown Retain-data Hessian Estimation

A central paradigm is the use of influence function/Newton-style correction, operationalized when DrD_r is unknown: estimate the Hessian HrH_r of L(θ;Dr)L(\theta; D_r) using only perturbations and statistics derived from DfD_f (Ahmed et al., 20 Aug 2025). The methodology involves random Gaussian parameter perturbations, matching second-order Taylor expansions on the loss over DfD_f to solve for HrH_r via a semidefinite program:

H^r=argminX01mi=1m(12δθiTXδθigfTδθiΔf(θi))2\hat H_r = \arg\min_{X \succeq 0} \frac{1}{m}\sum_{i=1}^m \left(\frac{1}{2} \delta\theta_i^T X \delta\theta_i - g_f^T \delta\theta_i - \Delta \ell_f(\theta_i)\right)^2

After solving for H^r\hat H_r, the unlearning update is:

θuf=θH^r1gf\theta_{uf} = \theta^* - \hat H_r^{-1} g_f

where gf=θL(θ;Df)g_f = \nabla_\theta L(\theta^*; D_f). This approach yields unlearning with provable fidelity to ideal retrain, provided the Hessian estimation error is controlled (Ahmed et al., 20 Aug 2025).

2.2 Synthetic Adversarial and Noise-based Unlearning

Several frameworks generate adversarial or error-maximizing noise samples that act as "anti-examples" for the forget set; these are used to drive rapid forgetting via targeted parametric updates. Examples include:

  • Proxy adversarial data generation (ZS-PAG): Adversarial examples are synthesized starting from DfD_f, targeting alternative classes or maximizing existing misclassification likelihood (Chen et al., 29 Jul 2025). Updates are projected into the orthogonal complement of the proxy retain-data subspace to prevent over-unlearning.
  • Error-maximizing/minimizing noise: Synthetic inputs are created to maximize loss on DfD_f and minimize on DrD_r, followed by an impair/repair optimization on parameters (Tarun et al., 2021, Chundawat et al., 2022).
  • Synthetic sample generation for CLIP: Gradient ascent in input space produces synthetic images for the forget class; local Lipschitz regularization is enforced on these (Kravets et al., 2024).

2.3 Architectural and Representation-Level Unlearning

Architectural approaches do not modify base model parameters, but intervene on representation bottlenecks or classifier heads:

  • Discrete key-value bottleneck masking: Codebook entries highly correlated with DfD_f are identified and masked, instantly and compute-free, excising the forget class at inference (Shah et al., 2023).
  • Nullspace projection for CLIP: The image embedding projection is replaced by the orthogonal complement to the span of forget-class text embeddings, annihilating linear alignment to the forgotten class (Mishra et al., 16 Dec 2025).

Representation unlearning employs a lightweight transformation TϕT_\phi:

minϕ  LrZS(ϕ)+βLfZS(ϕ)\min_\phi\; \mathcal{L}_r^{ZS}(\phi) + \beta\, \mathcal{L}_f^{ZS}(\phi)

where LrZS\mathcal{L}_r^{ZS} anchors retained data prototypes (in practice, final layer weights), and LfZS\mathcal{L}_f^{ZS} repels forget-set representations from retained prototypes, leveraging the "neural collapse" structure to approximate retained information in a zero-shot setting (Almudévar et al., 29 Jan 2026).

2.4 Neuronal Path and Layerwise-Pruning Methods

Interpretable and data-efficient: Layer-wise relevance propagation (LRP) identifies neurons most responsible for the forget class, which are directly zeroed or pruned, breaking the neuronal path responsible for forgotten outputs with no further training (Chang et al., 2024). This method relies on public or generated data for LRP input, without accessing original training samples.

3. Pseudocode Summaries

A representative pseudocode for Hessian-estimation-based zero-shot unlearning (Ahmed et al., 20 Aug 2025):

1
2
3
4
5
6
7
g_f = grad_theta L(theta_star; D_f)
for i in range(m):
    delta_theta_i = Normal(0, sigma^2 * I)
    theta_i = theta_star + delta_theta_i
    delta_l_f = L(theta_i; D_f) - L(theta_star; D_f)
    # Accumulate for SDP objective
theta_uf = theta_star - H_hat^{-1} @ g_f

Pseudocode for masking-based class unlearning (Shah et al., 2023):

1
2
3
4
5
6
7
8
for x in F:
    h = E(x)
    for c in heads:
        k_star_c = argmin_k ||h_c - e_{c,k}||
        count[c, k_star_c] += 1
M = top N_a (c, k) by count
for (c, k) in M:
    masked[c, k] = True  # or set distance to inf

4. Empirical Benchmarks and Key Findings

Zero-shot unlearning methods are evaluated on standard benchmarks, including CIFAR-10, CIFAR-100, ImageNet derivatives, VGGFace, and CLIP-based retrieval datasets. Across methodologies, core metrics are:

  • Forget-set accuracy reduction (target near 0%).
  • Retain-set/test accuracy (target: minimal drop relative to full retrain).
  • Membership-Inference Attack (MIA) score (target: approach 50% for optimal unlearning).

Notable empirical findings include:

Methodology Forget Acc ↓ Retain Acc ↓ Speedup MIA Score Dataset Citation
Source-free Hessian (Ahmed et al., 20 Aug 2025) 0–few % 2–5% pts >100x retrain ≈random CIFAR-10/100, Caltech-256 (Ahmed et al., 20 Aug 2025)
Key-value masking (Shah et al., 2023) 0% ≤0.5% ~50x distillation - CIFAR-10/100, LACUNA-100 (Shah et al., 2023)
Nullspace CLIP (Mishra et al., 16 Dec 2025) 0% ≤7% >1000x retrain ≥70% StanfordCars/Dogs, OxfordFlowers (Mishra et al., 16 Dec 2025)
LRP-pruning (Chang et al., 2024) ≈0% <3% >100x retrain 1.0 MNIST/CIFAR-10/100 (Chang et al., 2024)
ZS-PAG subspace (Chen et al., 29 Jul 2025) <2% ≤2% ~10x retrain ≈retrain CIFAR-10/100, Facescrub (Chen et al., 29 Jul 2025)
Codebook LLM (Wu et al., 2024) >>random 0–30% drop Instantaneous - T5-small (opus_books) (Wu et al., 2024)

All methods confirm dramatic improvements over baseline zero-shot competitors (negative gradient ascent, random relabeling, fine-tuning, Amnesiac) and match or closely approximate full retraining in forget efficacy, with significant computational gains (Ahmed et al., 20 Aug 2025, Chang et al., 2024, Mishra et al., 16 Dec 2025, Shah et al., 2023, Chen et al., 29 Jul 2025, Wu et al., 2024).

5. Theoretical and Practical Limitations

Current zero-shot unlearning methods provide varying levels of formal guarantees:

Key limitations include:

  • Difficulty in extending to arbitrary subsets in highly entangled feature spaces, where representation overlap between classes degrades selective masking efficiency (Shah et al., 2023, Mishra et al., 16 Dec 2025).
  • Potential non-optimality for example-level unlearning since most methods assume class-level partition structure.
  • Sensitivity to model redundancy: underparameterized or low-capacity backbones tend to incur higher collateral utility losses (Chang et al., 2024).
  • For methods using pseudo-labeling or adversarial proxy generation, the utility of the proxy set depends on the model's local decision boundary geometry; very flat or highly curved boundaries may degrade performance (Chen et al., 29 Jul 2025).
  • Regularizer and projection hyperparameter tuning remains empirically driven in the absence of direct retained-data metrics.

6. Extensions, Modalities, and Open Directions

Zero-shot unlearning methods are being extended to new modalities and operational regimes:

Open challenges include formalizing unlearning for arbitrary sample subsets, developing certified privacy guarantees (e.g., ϵ\epsilon-differential privacy analogs for unlearning), and robustly managing information deletion in highly entangled or continually learned model spaces (Shah et al., 2023, Almudévar et al., 29 Jan 2026, Maheri et al., 9 Dec 2025).

7. Summary Table of Selected Zero-Shot Unlearning Algorithms

Approach Core Mechanism Data Requirement Main Guarantee or Limitation Exemplary Paper
Hessian Estimation + Influence Surrogate HrH_r estimation DfD_f, θ\theta^* Grad. norm bound, close to retrain (Ahmed et al., 20 Aug 2025)
Discrete Masking (DKVB/VQ) Architectural masking DfD_f Inference-only, minimal compute, class-level (Shah et al., 2023)
Nullspace Projection (CLIP) Linear projection in head Forget-class text embeds Exact class line removal, linear only (Mishra et al., 16 Dec 2025)
LRP Neuronal Pruning Layer-wise relevance/pruning Synthetic/public DfD_f Empirical validity, fast, interpretable (Chang et al., 2024)
Adversarial Proxy + Subspace Synthetic proxy batch + proj. DfD_f, θ\theta^* Provable remain-set utility retention (Chen et al., 29 Jul 2025)
Representation Bottleneck SAE, codebook mask Task/contrast data KL divergence change on target topic (Wu et al., 2024)
Activation Steering (TTS) Hidden vector projection Reference utterances Training-free, dynamic opt-out (Lee et al., 28 Jan 2026)

Zero-shot machine unlearning provides an expanding toolkit for data-deletion compliance in modern learning systems, achieving significant progress in computational tractability, guarantee strength, and cross-modality applicability (Ahmed et al., 20 Aug 2025, Shah et al., 2023, Mishra et al., 16 Dec 2025, Chang et al., 2024, Chen et al., 29 Jul 2025, Wu et al., 2024, Maheri et al., 9 Dec 2025, Almudévar et al., 29 Jan 2026, Song et al., 17 Nov 2025, Lee et al., 28 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Zero-Shot Machine Unlearning Method.