Dispersion-Based Unlearning Approach
- The paper introduces a method that disperses embeddings on a hypersphere to effectively erase specific face identities from retrieval systems.
- It employs both uniform and hard-dispersion loss functions to break compact clusters, significantly reducing retrieval metrics like mAP and R@1 for forgotten identities.
- The approach confines parameter updates to the forget set, preserving the global geometry and retrieval performance for retained identities.
A dispersion-based unlearning approach is a methodology for selectively erasing information associated with particular classes or identities from deep embedding-based retrieval systems. It operates by dispersing the embeddings of selected ("forget") identities over the hypersphere to prevent the formation of compact clusters, thereby rendering these identities unretrievable via standard similarity search while preserving model utility for all retained identities. This technique addresses privacy concerns and regulatory compliance in surveillance-oriented machine learning by providing a practical and effective means of face identity forgetting within state-of-the-art embedding architectures (Zakharov, 15 Dec 2025).
1. Problem Setting and Unlearning Objective
Given a pretrained face-embedding model that produces embeddings , the output vectors are -normalized to the unit hypersphere: , so . Let the complete dataset contain distinct identities. This set is partitioned into a "forget" set corresponding to identities in (to be forgotten) and a "retain" set with the remaining identities.
The unlearning goal is formalized in retrieval terms:
- (A) For any forget-set query and any other face of the same identity vs. a negative of a different identity, post-unlearning parameters , retrieval flips: , where denotes cosine similarity, i.e., .
- (B) For all retained identities, the original retrieval orderings are preserved: .
Cluster Compactness Score (CS):
For a set of identities (forget or retain), compactness is:
Lower CS indicates a more dispersed (less compact) identity cluster.
2. Dispersion Losses and Mathematical Formulation
To directly erase identity structure, the dispersion approach employs mini-batches solely from .
- For data in a batch, for anchor define and .
Uniform Dispersion Loss:
where is a margin hyperparameter. The hinge penalty activates whenever two same-identity embeddings are too close (cosine similarity above ), driving within-class pairs further apart.
Hard-Dispersion Loss:
This loss targets the most similar positive for each anchor, more aggressively disrupting the tightest intra-class links.
Combined Objective:
In main experiments, only the dispersion loss is optimized:
with , ; typical settings: , learning rate , steps.
3. Dispersion-Unlearning Algorithm
The operational steps are as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
Input: f_ω -- pretrained face encoder 𝒟_f -- forget‐set images m -- margin (e.g. 0.2) lr -- learning rate (e.g. 1e–4) B -- batch size (32 or 160) T -- number of unlearning steps (1000) use_hard? -- use hard-dispersion or uniform for t in 1..T: Sample batch {(I_i, y_i)} from ℒ_f x_i = f_ω(I_i) ħx_i = x_i / ∥x_i∥₂ For each batch, build P_i, 𝒩_pos If use_hard?: L = (1/|𝒩_pos|) ∑_i max(0, m + max_j∈P_i ħx_i⋅ħx_j) Else: L = (1/|𝒩_pos|) ∑_i (1/|P_i|) ∑_j∈P_i max(0, m + ħx_i⋅ħx_j) ω ← ω − lr · ∇_ω L Return f_{ω’} |
All gradients are restricted to ; no training signal is backpropagated through , so the geometry of retained identities remains stable.
4. Selective Impact and Cluster Structure Preservation
Dispersion-based unlearning preserves retrieval utility for non-forgotten identities via several mechanisms:
- Selective Gradients: Sampling exclusively from ensures parameter updates do not affect the representations of retained identities.
- Hyperspherical Repulsion: The margin in the hinge term bounds dispersion, preventing unwanted distortion of the embedding space or over-dispersal.
- No Classifier Perturbation: Classifier weights or decision boundaries, common targets in classification-based unlearning approaches, remain unaltered. Only the local structures of forget-set clusters are modified.
This suggests that the method minimally impacts the discriminative power of the model for classes outside the forget set.
5. Experimental Results and Baseline Comparisons
Backbone and Training:
Architecture: IResNet-50, CosFace loss (, ); pretraining on Glint360K with fine-tuning on CelebA; 512-D unit embeddings.
Dispersion Hyperparameters:
Learning rate: ; margin: $0.2$; batch size: $32$ (CelebA), $160$ (VGGFace2); unlearning steps: $1000$; .
Baselines:
Random Labeling, Gradient Ascent, Lipschitz Unlearning, Contrastive Unlearning, Boundary Shrink (adapted to CosFace).
Quantitative Performance:
- On CelebA forget-set:
- Original model: mAP , R@1
- Best baseline (Boundary Shrink): mAP , R@1
- Dispersion Loss: mAP , R@1
- Hard-Dispersion: mAP , R@1
- Retention (CFP-FP/VGGFace2):
- Original: $91.0/97.5$, $89.1/98.8$
- Dispersion: $89.2/97.3$, $87.5/98.8$
- Cluster Compactness (CelebA, forget set):
- Original: $0.615$; Boundary Shrink: $0.255$; Dispersion: $0.089$; Hard-Dispersion: $0.086$
- VGGFace2 (extended, forget set):
- Dispersion: mAP , R@1 ; Hard-Dispersion: mAP , R@1
These results demonstrate markedly superior forgetting (larger drops in mAP/R@1 and cluster compactness) relative to all baselines, with essentially unperturbed performance for retained classes (Zakharov, 15 Dec 2025).
6. Geometric Rationale and Superiority over Existing Approaches
The effectiveness of hyperspherical dispersion arises from several geometric and algorithmic properties:
- Direct Geometric Manipulation: Classification-based unlearning corrupts classifier weights or labels, but underlying embedding clusters often remain compact, allowing successful retrieval via nearest neighbor search. Dispersion loss destroys local cluster cohesion directly at the embedding level.
- Margin-Based Repulsion: The hinge margin on pairwise cosine values guarantees an angular separation of at least between any two embeddings of the same forgotten identity, maximizing their dispersal within the available hyperspherical surface.
- Preservation of Embedding Structure: Embeddings maintain unit norm. The retention of the global hyperspherical structure means non-forgotten clusters remain unaffected, trading retrieval accuracy for forgotten identities in a tightly controlled manner.
- Algorithmic Simplicity and Robustness: The approach has a single, interpretable hyperparameter (). It requires no per-class retraining or additional regularization and delivers stable, reproducible forgetting outcomes with minimal risk of unintended side effects.
In summary, dispersion losses—uniform and hard—provide an explicit and geometrically principled method to dissolve compact identity clusters on the face-embedding hypersphere, producing dramatic and targeted degradation in retrievability for forgotten identities, while leaving the retrieval performance for all other faces almost unchanged. This surpasses the efficacy and selectivity of all previously proposed approximate unlearning strategies for embedding-based face retrieval (Zakharov, 15 Dec 2025).