Face Identity Unlearning

Updated 22 December 2025

Face Identity Unlearning is a technique that removes learned identity features from models to ensure privacy while keeping performance on remaining data.
It encompasses various approaches—including supervised, unsupervised, and one-shot methods—that balance the trade-off between erasing specific identities and maintaining utility.
Evaluation metrics like NoMUS and UES are used to assess both the extent of identity forgetting and the retention of task performance in classification, retrieval, and generative models.

Face identity unlearning refers to the process of modifying machine learning models—especially facial recognition, retrieval, or generative systems—such that the influence or memory of specific identities is effectively erased while retaining overall task utility. This capability is integral in privacy-preserving machine learning, regulatory compliance (e.g., data-deletion requests), and for mitigating risks associated with the unauthorized use of biometric data. Face identity unlearning encompasses supervised, unsupervised, discriminative (classification, retrieval), and generative (GANs, diffusion) approaches, each with distinct methodological and evaluation frameworks.

1. Problem Formulation and Theoretical Frameworks

Two principal formulations govern face identity unlearning: instance-level unlearning in discriminative models, and identity erasure in generative models. In both settings, the objective is to produce a model that (i) performs as if certain identities or samples were never observed in training, and (ii) preserves utility—classification accuracy, retrieval metrics, or image quality—on non-forgotten data.

Given a model trained on dataset $\mathcal{D}$ , with subsets $\mathcal{D}_{\mathrm{forget}}$ (identities/data to be forgotten) and $\mathcal{D}_{\mathrm{retain}}$ (to be retained), the goal is to generate parameters $\theta_{\mathrm{unlearn}}$ such that

$\theta_{\mathrm{unlearn}} \approx \underset{\theta}{\arg\min}\;\mathcal{L}(\theta;\,\mathcal{D}_{\mathrm{retain}})$

but without retraining ab initio. Strict unlearning would match the retrained model up to a specified distance metric, but most practical frameworks operate under approximate forgetting guarantees (Choi et al., 2023).

Evaluation criteria include:

Utility: task performance on $\mathcal{D}_{\mathrm{test}}$ , e.g., $\mathrm{Utility}(\theta)=P_{(x,y)\sim\mathcal{D}_{\mathrm{test}}}[\theta(x)=y]$ .
Forgetting strength: reducing membership inference or re-identification success on $\mathcal{D}_{\mathrm{forget}}$ , often measured by a membership-inference classifier's accuracy or by cluster compactness in retrieval models (Choi et al., 2023, Zakharov, 15 Dec 2025).
NoMUS (Normalized Machine-Unlearning Score): trades off utility and forgetting (Choi et al., 2023).
Unlearning Efficiency Score (UES): balances retention and forgetting efficiency (Shivam et al., 23 Sep 2025).

2. Supervised, Unsupervised, and One-Shot Unlearning Paradigms

Supervised unlearning requires identity or class labels and full access to training batches. Notable methods include:

Retraining: re-builds the model on $\mathcal{D}_{\mathrm{retain}}$ , optimally removing target influences but at high computational cost.
Fine-tuning: standard gradient descent on $\mathcal{D}_{\mathrm{retain}}$ , which is typically insufficient for full identity erasure due to residual memorization (Choi et al., 2023).

Unsupervised methods such as CURE for facial recognition operate solely on sample-level signals, using clustering and pseudo-labels derived from embeddings to structure erasure (Shivam et al., 23 Sep 2025). CURE employs K-means clustering of pre-computed teacher model embeddings to assign pseudo-labels to both "forget" and "retain" sets, integrating margin-based contrastive, cosine, and KL-divergence losses to push forgotten samples into maximally distant regions of embedding space, while stabilizing retained samples.

One-shot unlearning, represented by MetaUnlearn, assumes the absence of full training data (e.g., per privacy regulations). Here, the unlearning operator accesses only a single “portrait” for each target identity. MetaUnlearn meta-learns a surrogate forgetting loss that, when applied to these single images, steers model weights towards $\theta_{\mathrm{unlearn}}$ that closely approximates retraining, as measured by the "tug-of-war" score (ToW), even when the support sample is not centrally representative of the identity's training distribution (Min et al., 16 Jul 2024).

3. Face Identity Unlearning for Classification and Retrieval

Classification Benchmarks and Algorithms

Canonical benchmarks such as MUFAC (age-band classification) and MUCAC (celebrity attribute classification) partition identities at the instance level to rigorously evaluate unlearning methods, ensuring no leakage across forget/retain/test/unseen sets (Choi et al., 2023). Evaluated algorithms include:

NegGrad / Advanced NegGrad: use loss maximization (gradient ascent) on the forget set, optionally joint with descent on retain set to induce a Pareto-optimal tradeoff (Choi et al., 2023).
CF-k (Catastrophic Forgetting k-fold): averages parameters over folds to approximate the loss of forgetting.
SCRUB and UNSIR: generative scrubbing and zero-shot unlearning approaches that introduce per-sample perturbations or synthetic noise to erase fingerprints of the forgotten data.
CURE (unsupervised): as described above, achieves state-of-the-art UES, especially in scenarios where identity labels are unavailable or low-quality images are targeted for erasure (Shivam et al., 23 Sep 2025).

Retrieval Systems and Embedding Dispersion

Face retrieval models learn compact, discriminative identity clusters in a hyperspherical embedding space. Unlearning in these systems requires collapsing the intra-identity compactness for targeted identities while preserving the rest of the space. The embedding-dispersion approach optimizes a margin-based “dispersion loss,” directly pushing embeddings of each forgotten identity class apart, thereby destroying their tight clusters and rendering them unretrievable (Zakharov, 15 Dec 2025). Quantitatively, the compactness score for forgotten identities drops from $\sim0.62$ to $\sim0.09$ post-unlearning, with minimal (<2%) loss on retain set retrieval performance.

4. Face Identity Unlearning in Generative Models

Generative identity unlearning addresses the risk of synthesizing images of specific individuals from latent codes. Two main lines have emerged:

GUIDE: Prevents the generator from reconstructing a target identity by mapping its latent code to a distinct, “anonymous” target via UFO (Un-Identifying Face On latent space) and LTU (Latent Target Unlearning). The framework employs local, adjacency-aware, and global preservation losses to ensure erasure is confined to the target identity and its local latent neighborhood without distribution collapse. A single image suffices to erase an identity cluster due to the informativeness of strong inversion (Seo et al., 16 May 2024).
SUGAR: Extends GUIDE to scalable, many-identity unlearning with ID-specific de-identification surrogates in latent space and a continual-learning regularizer for utility preservation. SUGAR learns a mapping for each target identity's latent direction, redirecting generations to plausible surrogates, and uses an EWC penalty to prevent catastrophic forgetting of retained identities. In large-scale tests ( $N=200$ ), SUGAR retains much higher identity similarity (utility) and lower FID than prior approaches (Nguyen et al., 6 Dec 2025).
Text-to-Unlearn: Introduces cross-modal (text-prompt-based) unlearning for GANs, leveraging the CLIP embedding space to guide the generator away from the prompt direction. For identities, outputs are pushed toward the average face, with the effectiveness assessed by cosine similarity reductions in ArcFace space and BLIP-2/VQA-based alignment metrics (Nagasubramaniam et al., 1 Apr 2025).
Latent Space Disentanglement: Earlier methods construct dual-encoder architectures to explicitly factor identity from attributes, using adversarial and cycle-consistency losses in the latent space of fixed pre-trained generators (e.g., StyleGAN). At test time, swapping in new identity vectors effects de-identification (Nitzan et al., 2020).

In diffusion models, ID $^2$ Face achieves disentanglement via a conditional diffusion architecture, combining structured latent decomposition (identity/non-identity), cross-attention fusion, orthogonal identity mapping, and multi-component reconstruction losses. This enables direct, controllable anonymization at inference by sampling random identity code orthogonal to the original (Yang et al., 28 Oct 2025).

5. Vision–Language and Foundation Model Unlearning

Face-identity unlearning has been evaluated in the context of Vision–LLMs (VLMs) using controlled benchmarks (FIUBench) with synthetic faces and private textual attributes. Here, state-of-the-art algorithms include gradient ascent on the forget set, combined retention-forgetting objectives, and preference optimization (refusal fine-tuning). Evaluation leverages exact match on private keywords, KS-test, membership-inference attacks, and adversarial extraction via paraphrased questions. All methods currently exhibit a strong tradeoff, with high-quality forgetting achievable only at a steep utility cost (Ma et al., 5 Nov 2024).

6. Practical Guidelines, Evaluation, and Open Challenges

Practical deployment requires aligning the choice of method with available data access (supervised, unsupervised, or support-only), utility-forgetting tradeoffs, and computational constraints. Unlearning pipelines typically involve splitting training data, precomputing reference embeddings, clustering (where needed), iterative loss optimization combining both forgetting and retention components, and robust evaluation—including confidence/entropy shifts and membership-inference analysis (Choi et al., 2023, Shivam et al., 23 Sep 2025).

For generative models, inversion and mapping architectures are often frozen, with parameter updates targeted to synthesis layers and surrogate-latent mapping networks (Seo et al., 16 May 2024, Nguyen et al., 6 Dec 2025). In diffusion-based anonymization, identity-decomposition and orthogonalization in latent space are mandatory for achieving true erasure (Yang et al., 28 Oct 2025).

Common evaluation metrics are:

Task utility (accuracy/mAP/F1).
Forget-score (distance from random assignment in membership inference).
NoMUS, UES, ToW (for aggregate trade-offs).
FID and ArcFace similarity for generative quality/identity.
Compactness score and centroid-level classification for retrieval (Zakharov, 15 Dec 2025).

Current limitations include:

Absence of strong formal unlearning guarantees in facial recognition settings.
Difficulty scaling to many identities or dynamic/unseen support requests without retraining.
Risk of residual information leakage, especially for high intra-identity variance cases or tightly entangled features.
Trade-off frontiers remain; perfect forgetting still degrades utility (Choi et al., 2023, Shivam et al., 23 Sep 2025).

Open problems encompass scalable, provably-private unlearning methods, robust defense against adaptive privacy attacks, and effective removal in multi-modal and foundation models (Ma et al., 5 Nov 2024).

7. Identity Adversarial Training and Functional De-Identification

Identity Adversarial Training (IAT) addresses implicit “shortcut” memorization of subject identity in feature learning for facial analysis tasks (e.g., AU detection). By applying a strong gradient-reversal layer against an identity classifier head during fine-tuning on large-scale ViT backbone models, IAT induces identity-invariant features, circumventing identity-based shortcut solutions. Optimal IAT deployment requires a linear identity head and large regularization weight to ensure effective gradient reversal. Empirically, IAT reduces identity extraction accuracy from 83% to ~28% (chance is 2.4%), with concomitant gains in target AU detection F1 (Ning et al., 15 Jul 2024).

Face identity unlearning thus represents a rapidly evolving, multi-faceted research field, spanning supervised, unsupervised, generative, and foundation model approaches, each with a rich ecosystem of tailored algorithms and rigorous privacy–utility evaluation metrics. The discipline is critical both for privacy guarantees in large-scale biometrics and for advancing the fundamental understanding of selective information erasure in deep learning systems.