Papers
Topics
Authors
Recent
Search
2000 character limit reached

Zero-Shot Unlearning

Updated 3 July 2026
  • Zero-shot unlearning is a technique that precisely removes data-induced knowledge from trained models without retraining on complete datasets, ensuring regulatory compliance.
  • It employs methodologies such as closed-form projections, adversarial proxy synthesis, and gradient manipulations to target and erase specific information while retaining overall functionality.
  • Empirical outcomes in domains like vision-language models, TTS, and federated systems demonstrate near-complete forget accuracy and robust privacy guarantees with minimal performance loss.

Zero-shot unlearning is a family of techniques for selectively erasing data-induced knowledge from trained machine learning models, under the stringent constraint that no access to the original training data (except potentially the forget set) is permitted during the unlearning process. This paradigm addresses critical requirements for privacy, regulatory compliance (e.g., GDPR “right to be forgotten”), model decontamination, and downstream risk mitigation in both foundation models and domain-specific neural architectures. Zero-shot unlearning encompasses a variety of computational strategies including closed-form feature-space projections, adversarial proxy synthesis, gradient or subspace manipulations, and architectural design for post hoc instance removal. The core technical challenge is to precisely “forget” specific information while maintaining model fidelity on all unrelated tasks, often in highly overparameterized, multimodal, or structured prediction settings.

1. Foundational Principles and Problem Settings

Zero-shot unlearning is formalized as the process of transforming a pretrained model fθf_\theta into a new model fθf_{\theta'} such that the influence of a designated forget set Df\mathcal{D}_f (typically a class, individual, or statistical subset) is effectively erased, while utility on the retain set Dr\mathcal{D}_r is preserved. Critically, access to Dr\mathcal{D}_r—which in conventional machine unlearning is required for re-optimization or influence estimation—is explicitly disallowed. In the strictest versions, only model weights and metadata about Df\mathcal{D}_f (e.g., class labels or prompt text) are available (Chundawat et al., 2022, Chen et al., 29 Jul 2025).

Contemporary frameworks extend zero-shot unlearning across multiple modalities and deployment settings:

The zero-shot constraint is distinct from “data-free” settings, as the latter may permit some retained proxy samples or distillation from intermediate representations.

2. Methodological Taxonomy of Zero-Shot Unlearning

Zero-shot unlearning strategies divide into several major classes:

2.1 Closed-form Feature-Space Projections

  • Nullspace/Orthogonal projections: By constructing an orthonormal basis for the subspace spanned by the forget set (e.g., text and/or visual prototypes in CLIP), unlearning is achieved by projecting features orthogonally to this span (Mishra et al., 16 Dec 2025, Mishra et al., 16 Dec 2025). The core operator is P=IUUP = I - UU^\top where UU's columns form the forget subspace basis. This method enables efficient, data-free erasure at test time, without retraining.
  • Partial (soft) projection: Linear transform WW is selected to minimize a joint functional penalizing projections along forget directions while preserving retain-class structure, yielding a tunable tradeoff (Mishra et al., 16 Dec 2025).

2.2 Adversarial Proxy and Subspace Modeling

  • Proxy data generation: When Dr\mathcal{D}_r is inaccessible, adversarial perturbations of the forget set are optimized to cross the decision boundary into surrogate classes, forming a proxy for the retained distribution (Chen et al., 29 Jul 2025). Singular value decomposition (SVD) is then used to identify the retained-feature subspace, enabling subspace-constrained gradient updates that prevent over-unlearning on fθf_{\theta'}0.
  • Statistical estimation of Hessians or Fisher matrices: For parameter-centric models with convex losses, source-free unlearning can estimate the Hessian of the remaining data via second-order Taylor expansions using only small perturbations and loss differences on fθf_{\theta'}1 (Ahmed et al., 20 Aug 2025). Newton-type updates are then computed in closed-form.

2.3 Data Synthesis, Distillation, and Decoupled Representations

2.4 Network Path Disruption and Relevance Analysis

  • Layer-wise relevance analysis (LRA): Highly relevant neurons for the forget class are detected via backward relevance propagation using only auxiliary proxies, and their outgoing weights are dropped or re-randomized (neuronal path perturbation, NPP). This severs classification paths while preserving utility (Chang et al., 2024).

2.5 Inference-Time and Structural Approaches

  • Inference-time steering: In TTS, zero-shot unlearning occurs via dynamic, layer-selective subtraction of speaker-specific components from hidden activations at inference, suppressing identity with no retraining (Lee et al., 28 Jan 2026).
  • Unlearning by design: Models such as MUNKEY are architected from the start for key-based instant forgetting, sidestepping gradient updates altogether (Laguna et al., 16 Mar 2026).

3. Formal Guarantees, Theoretical Results, and Metrics

Zero-shot unlearning research addresses both empirical effectiveness and formal guarantees:

4. Practical Implementations and Empirical Outcomes

Empirical work on zero-shot unlearning spans datasets including CIFAR-10/100, SVHN, ImageNet-1K, PACS, DomainNet, and foundation models such as CLIP and T5-based LLMs. Notable implementation and outcome patterns:

  • CLIP-specific frameworks: Closed-form nullspace projections drop forget-class accuracy from fθf_{\theta'}4 to fθf_{\theta'}5–fθf_{\theta'}6, with retain accuracy drops fθf_{\theta'}7 and MIA improvements fθf_{\theta'}8–fθf_{\theta'}9 points versus strong synthetic-data or iterative baselines (Mishra et al., 16 Dec 2025, Mishra et al., 16 Dec 2025, Kravets et al., 2024).
  • Vision classifiers: Subspace-constrained (ZS-PAG) and proxy-based methods achieve forget-class test acc Df\mathcal{D}_f0 and strong retention (Df\mathcal{D}_f1 on CIFAR-100), outperforming data-free or random-label baselines (Chen et al., 29 Jul 2025, Ahmed et al., 20 Aug 2025).
  • Discrete codebook (DKVB, CodeUnlearn): Masking approximately Df\mathcal{D}_f2–Df\mathcal{D}_f3 of codes achieves Df\mathcal{D}_f4 accuracy on forget classes with no more than Df\mathcal{D}_f5 loss on retain classes—at near-zero computational cost (Shah et al., 2023, Wu et al., 2024).
  • TTS and speaker unlearning: Inference-time steering (TruS) reduces speaker similarity measure SIM-SO from Df\mathcal{D}_f6 to Df\mathcal{D}_f7 on opt-out speakers with word error rates preserved, matching the performance of high-cost retraining methods (Lee et al., 28 Jan 2026).
  • Federated and personalized settings: Jellyfish achieves full erasure on Df\mathcal{D}_f8 and recovers within Df\mathcal{D}_f9 of original Dr\mathcal{D}_r0 accuracy using only proxy data, while ZK-APEX enables verifiable unlearning proofs with Dr\mathcal{D}_r1 speedup over retraining-based verification (Wang et al., 5 Apr 2026, Maheri et al., 9 Dec 2025).

5. Limitations, Trade-Offs, and Extensions

Zero-shot unlearning methods, despite their efficiency and privacy alignment, face recurring challenges:

6. Emerging Directions and Applications

Research is progressively advancing toward:

7. Representative Methods and Empirical Results

Method/Domain Mechanism Retain Acc Impact Forget Acc Drop MIA/Privacy
CLIP Nullspace Proj. (Mishra et al., 16 Dec 2025, Mishra et al., 16 Dec 2025) Dr\mathcal{D}_r4 feature projection Dr\mathcal{D}_r5 Dr\mathcal{D}_r6 MIA Dr\mathcal{D}_r7 points
ZS-PAG (Chen et al., 29 Jul 2025) Proxy subspace/PGD + projected update Dr\mathcal{D}_r8 Dr\mathcal{D}_r9 matches retrain
DKVB (sparse code) (Shah et al., 2023) Mask codebook entries Dr\mathcal{D}_r0 to Dr\mathcal{D}_r1 matches SCRUB
TruS (TTS) (Lee et al., 28 Jan 2026) Inference-time identity steering no loss SIM drops Spk-ZRF: Dr\mathcal{D}_r2
Jellyfish (Fed) (Wang et al., 5 Apr 2026) Noise proxies + channel disent. Dr\mathcal{D}_r3 to Dr\mathcal{D}_r4 MIA Dr\mathcal{D}_r5 retr
ZK-APEX (Maheri et al., 9 Dec 2025) Mask+group-OBS + ZK proof Dr\mathcal{D}_r6 recovery Dr\mathcal{D}_r7 drop Verifiably safe

These empirical results indicate that, when properly designed, zero-shot unlearning mechanisms can deliver targeted erasure and strong retention with minimal computational and data overhead, underpinning a rapidly maturing set of deployable solutions across the machine learning landscape.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Zero-Shot Unlearning.