Zero-Shot Unlearning

Updated 3 July 2026

Zero-shot unlearning is a technique that precisely removes data-induced knowledge from trained models without retraining on complete datasets, ensuring regulatory compliance.
It employs methodologies such as closed-form projections, adversarial proxy synthesis, and gradient manipulations to target and erase specific information while retaining overall functionality.
Empirical outcomes in domains like vision-language models, TTS, and federated systems demonstrate near-complete forget accuracy and robust privacy guarantees with minimal performance loss.

Zero-shot unlearning is a family of techniques for selectively erasing data-induced knowledge from trained machine learning models, under the stringent constraint that no access to the original training data (except potentially the forget set) is permitted during the unlearning process. This paradigm addresses critical requirements for privacy, regulatory compliance (e.g., GDPR “right to be forgotten”), model decontamination, and downstream risk mitigation in both foundation models and domain-specific neural architectures. Zero-shot unlearning encompasses a variety of computational strategies including closed-form feature-space projections, adversarial proxy synthesis, gradient or subspace manipulations, and architectural design for post hoc instance removal. The core technical challenge is to precisely “forget” specific information while maintaining model fidelity on all unrelated tasks, often in highly overparameterized, multimodal, or structured prediction settings.

1. Foundational Principles and Problem Settings

Zero-shot unlearning is formalized as the process of transforming a pretrained model $f_\theta$ into a new model $f_{\theta'}$ such that the influence of a designated forget set $\mathcal{D}_f$ (typically a class, individual, or statistical subset) is effectively erased, while utility on the retain set $\mathcal{D}_r$ is preserved. Critically, access to $\mathcal{D}_r$ —which in conventional machine unlearning is required for re-optimization or influence estimation—is explicitly disallowed. In the strictest versions, only model weights and metadata about $\mathcal{D}_f$ (e.g., class labels or prompt text) are available (Chundawat et al., 2022, Chen et al., 29 Jul 2025).

Contemporary frameworks extend zero-shot unlearning across multiple modalities and deployment settings:

Multimodal vision–LLMs (e.g. CLIP): Erasure of classes or domains via subspace projections in embedding space (Mishra et al., 16 Dec 2025, Mishra et al., 16 Dec 2025, Kravets et al., 2024, zhang et al., 3 Jun 2025).
Text-to-speech and biometrics: Preventing unauthorized voice synthesis by zero-shot steering of hidden activations or randomized teacher–student protocols (Lee et al., 28 Jan 2026, Kim et al., 27 Jul 2025).
Structured or codebook-based architectures in LLMs: Deletion of discrete activation codes tied to unwanted concepts (Wu et al., 2024, Shah et al., 2023).
Federated and personalized models: Edge-device or client-specific zero-shot updates with verifiable proofs (Maheri et al., 9 Dec 2025, Wang et al., 5 Apr 2026).
Domain adaptation: Erasure of source-exclusive knowledge in deployed source-free models (Devalapally et al., 9 Apr 2026).

The zero-shot constraint is distinct from “data-free” settings, as the latter may permit some retained proxy samples or distillation from intermediate representations.

2. Methodological Taxonomy of Zero-Shot Unlearning

Zero-shot unlearning strategies divide into several major classes:

2.1 Closed-form Feature-Space Projections

Nullspace/Orthogonal projections: By constructing an orthonormal basis for the subspace spanned by the forget set (e.g., text and/or visual prototypes in CLIP), unlearning is achieved by projecting features orthogonally to this span (Mishra et al., 16 Dec 2025, Mishra et al., 16 Dec 2025). The core operator is $P = I - UU^\top$ where $U$ 's columns form the forget subspace basis. This method enables efficient, data-free erasure at test time, without retraining.
Partial (soft) projection: Linear transform $W$ is selected to minimize a joint functional penalizing projections along forget directions while preserving retain-class structure, yielding a tunable tradeoff (Mishra et al., 16 Dec 2025).

2.2 Adversarial Proxy and Subspace Modeling

Proxy data generation: When $\mathcal{D}_r$ is inaccessible, adversarial perturbations of the forget set are optimized to cross the decision boundary into surrogate classes, forming a proxy for the retained distribution (Chen et al., 29 Jul 2025). Singular value decomposition (SVD) is then used to identify the retained-feature subspace, enabling subspace-constrained gradient updates that prevent over-unlearning on $f_{\theta'}$ 0.
Statistical estimation of Hessians or Fisher matrices: For parameter-centric models with convex losses, source-free unlearning can estimate the Hessian of the remaining data via second-order Taylor expansions using only small perturbations and loss differences on $f_{\theta'}$ 1 (Ahmed et al., 20 Aug 2025). Newton-type updates are then computed in closed-form.

2.3 Data Synthesis, Distillation, and Decoupled Representations

Synthetic sample and generator-based approaches: Generative Feedback Networks (GFNs) synthesize optimal erasure samples (OES) to maximize loss on forget classes, driving aggressive forgetting while a secondary “recovery” phase restores utility on the tiny available retained set (Song et al., 17 Nov 2025). Lipschitz regularization on both input and text embeddings in CLIP achieves similar effects (Kravets et al., 2024), as does direct local smoothing (Foster et al., 2024).
Discrete key–value bottleneck and codebook intervention: Discrete representational codes activated by the forget set are masked at inference, ensuring immediate zero-shot unlearning without retraining (Shah et al., 2023, Wu et al., 2024). Similar logics underpin “key deletion” architectures designed for instant memory erasure (Laguna et al., 16 Mar 2026).

2.4 Network Path Disruption and Relevance Analysis

Layer-wise relevance analysis (LRA): Highly relevant neurons for the forget class are detected via backward relevance propagation using only auxiliary proxies, and their outgoing weights are dropped or re-randomized (neuronal path perturbation, NPP). This severs classification paths while preserving utility (Chang et al., 2024).

2.5 Inference-Time and Structural Approaches

Inference-time steering: In TTS, zero-shot unlearning occurs via dynamic, layer-selective subtraction of speaker-specific components from hidden activations at inference, suppressing identity with no retraining (Lee et al., 28 Jan 2026).
Unlearning by design: Models such as MUNKEY are architected from the start for key-based instant forgetting, sidestepping gradient updates altogether (Laguna et al., 16 Mar 2026).

3. Formal Guarantees, Theoretical Results, and Metrics

Zero-shot unlearning research addresses both empirical effectiveness and formal guarantees:

Differential privacy and indistinguishability: Some frameworks define success via output or parameter distributional closeness to an oracle retrained model (e.g., $f_{\theta'}$ 2) (Foster et al., 2024, Chundawat et al., 2022).
Bounding over-unlearning: Approaches like ZS-PAG prove that subspace-projected gradient updates retain performance on $f_{\theta'}$ 3 under Polyak–Łojasiewicz conditions (Chen et al., 29 Jul 2025).
Safety from attacks: Membership-inference (MIA) and model inversion attacks are standard evaluation metrics; state-of-the-art zero-shot techniques (e.g., LRA+NPP, proxy-adversarial methods, nullspace projections) achieve post-unlearning MIA scores near random-guess on the forget set, matching retrained baselines (Mishra et al., 16 Dec 2025, Chang et al., 2024, Shah et al., 2023).
Novel metrics: Speaker-zero retrain forgetting (spk-ZRF) quantifies speaker identity randomness in TTS; Anamnesis Index (AIN) measures recoverability of forget-set performance under retraining (Kim et al., 27 Jul 2025, Chundawat et al., 2022).

4. Practical Implementations and Empirical Outcomes

Empirical work on zero-shot unlearning spans datasets including CIFAR-10/100, SVHN, ImageNet-1K, PACS, DomainNet, and foundation models such as CLIP and T5-based LLMs. Notable implementation and outcome patterns:

CLIP-specific frameworks: Closed-form nullspace projections drop forget-class accuracy from $f_{\theta'}$ 4 to $f_{\theta'}$ 5– $f_{\theta'}$ 6, with retain accuracy drops $f_{\theta'}$ 7 and MIA improvements $f_{\theta'}$ 8– $f_{\theta'}$ 9 points versus strong synthetic-data or iterative baselines (Mishra et al., 16 Dec 2025, Mishra et al., 16 Dec 2025, Kravets et al., 2024).
Vision classifiers: Subspace-constrained (ZS-PAG) and proxy-based methods achieve forget-class test acc $\mathcal{D}_f$ 0 and strong retention ( $\mathcal{D}_f$ 1 on CIFAR-100), outperforming data-free or random-label baselines (Chen et al., 29 Jul 2025, Ahmed et al., 20 Aug 2025).
Discrete codebook (DKVB, CodeUnlearn): Masking approximately $\mathcal{D}_f$ 2– $\mathcal{D}_f$ 3 of codes achieves $\mathcal{D}_f$ 4 accuracy on forget classes with no more than $\mathcal{D}_f$ 5 loss on retain classes—at near-zero computational cost (Shah et al., 2023, Wu et al., 2024).
TTS and speaker unlearning: Inference-time steering (TruS) reduces speaker similarity measure SIM-SO from $\mathcal{D}_f$ 6 to $\mathcal{D}_f$ 7 on opt-out speakers with word error rates preserved, matching the performance of high-cost retraining methods (Lee et al., 28 Jan 2026).
Federated and personalized settings: Jellyfish achieves full erasure on $\mathcal{D}_f$ 8 and recovers within $\mathcal{D}_f$ 9 of original $\mathcal{D}_r$ 0 accuracy using only proxy data, while ZK-APEX enables verifiable unlearning proofs with $\mathcal{D}_r$ 1 speedup over retraining-based verification (Wang et al., 5 Apr 2026, Maheri et al., 9 Dec 2025).

5. Limitations, Trade-Offs, and Extensions

Zero-shot unlearning methods, despite their efficiency and privacy alignment, face recurring challenges:

Retain data unavailability: Proxy generation for $\mathcal{D}_r$ 2 or subspace estimation may degrade if the adversarial (or synthetic) proxy diverges from the real remaining data distribution (Chen et al., 29 Jul 2025, Ahmed et al., 20 Aug 2025).
Over-unlearning: Excessive smoothing, aggressive projections, or overbroad key masking can erode utility on non-target classes or tasks (Foster et al., 2024, Mishra et al., 16 Dec 2025).
Scalability and architectural assumptions: SDP-based Hessian estimation scales poorly beyond $\mathcal{D}_r$ 3, while neural collapse or codebook approaches are architecture-dependent and may require end-to-end retraining for capacity adaptation (Almudévar et al., 29 Jan 2026, Shah et al., 2023).
Theoretical guarantees: Most methods guarantee only local distributional indistinguishability or surrogate optimality (e.g., via influence estimates), with formal certified unlearning guarantees still limited (Maheri et al., 9 Dec 2025, Almudévar et al., 29 Jan 2026).
Task granularity and generalization: Instance-level forgetting is more challenging than class-level, and extensions beyond classification (e.g., detection, segmentation, open-vocabulary generative tasks) are an open arena (Wu et al., 2024, Devalapally et al., 9 Apr 2026).

6. Emerging Directions and Applications

Research is progressively advancing toward:

Continual and streaming zero-shot unlearning: Sequential handling of multiple, potentially overlapping forget requests, with utility decay management (Maheri et al., 9 Dec 2025, Chundawat et al., 2022).
Federated, edge, and verifiable protocols: Cryptographically sound proofs of unlearning with low communication and compute overhead fit privacy-oriented deployments (Maheri et al., 9 Dec 2025, Wang et al., 5 Apr 2026).
Hybrid strategies and cross-modal extensions: Combining proxy-based, projection, and codebook interventions with hybrid access to minimal retained data or synthetic anchors for robust utility (Song et al., 17 Nov 2025, Mishra et al., 16 Dec 2025).
Broader impact and regulatory utility: Deployment for right-to-be-forgotten compliance, model decontamination, bias mitigation, and remediation of unwanted correlations or offense content (Mishra et al., 16 Dec 2025, Devalapally et al., 9 Apr 2026).
Privacy and attack resistance: Systematic evaluation against membership inference, model inversion, and reactivation attacks is standard in state-of-the-art baselines (Mishra et al., 16 Dec 2025, Shah et al., 2023, Chang et al., 2024).

7. Representative Methods and Empirical Results

Method/Domain	Mechanism	Retain Acc Impact	Forget Acc Drop	MIA/Privacy
CLIP Nullspace Proj. (Mishra et al., 16 Dec 2025, Mishra et al., 16 Dec 2025)	$\mathcal{D}_r$ 4 feature projection	$\mathcal{D}_r$ 5	$\mathcal{D}_r$ 6	MIA $\mathcal{D}_r$ 7 points
ZS-PAG (Chen et al., 29 Jul 2025)	Proxy subspace/PGD + projected update	$\mathcal{D}_r$ 8	$\mathcal{D}_r$ 9	matches retrain
DKVB (sparse code) (Shah et al., 2023)	Mask codebook entries	$\mathcal{D}_r$ 0	to $\mathcal{D}_r$ 1	matches SCRUB
TruS (TTS) (Lee et al., 28 Jan 2026)	Inference-time identity steering	no loss	SIM drops	Spk-ZRF: $\mathcal{D}_r$ 2
Jellyfish (Fed) (Wang et al., 5 Apr 2026)	Noise proxies + channel disent.	$\mathcal{D}_r$ 3	to $\mathcal{D}_r$ 4	MIA $\mathcal{D}_r$ 5 retr
ZK-APEX (Maheri et al., 9 Dec 2025)	Mask+group-OBS + ZK proof	$\mathcal{D}_r$ 6 recovery	$\mathcal{D}_r$ 7 drop	Verifiably safe

These empirical results indicate that, when properly designed, zero-shot unlearning mechanisms can deliver targeted erasure and strong retention with minimal computational and data overhead, underpinning a rapidly maturing set of deployable solutions across the machine learning landscape.