Papers
Topics
Authors
Recent
2000 character limit reached

Dispersion-Based Unlearning Approach

Updated 22 December 2025
  • The paper introduces a method that disperses embeddings on a hypersphere to effectively erase specific face identities from retrieval systems.
  • It employs both uniform and hard-dispersion loss functions to break compact clusters, significantly reducing retrieval metrics like mAP and R@1 for forgotten identities.
  • The approach confines parameter updates to the forget set, preserving the global geometry and retrieval performance for retained identities.

A dispersion-based unlearning approach is a methodology for selectively erasing information associated with particular classes or identities from deep embedding-based retrieval systems. It operates by dispersing the embeddings of selected ("forget") identities over the hypersphere to prevent the formation of compact clusters, thereby rendering these identities unretrievable via standard similarity search while preserving model utility for all retained identities. This technique addresses privacy concerns and regulatory compliance in surveillance-oriented machine learning by providing a practical and effective means of face identity forgetting within state-of-the-art embedding architectures (Zakharov, 15 Dec 2025).

1. Problem Setting and Unlearning Objective

Given a pretrained face-embedding model fω:RH×W×3Rdf_\omega: \mathbb{R}^{H\times W\times 3} \to \mathbb{R}^d that produces embeddings xi=fω(Ii)x_i = f_\omega(I_i), the output vectors are 2\ell_2-normalized to the unit hypersphere: x^i=xi/xi2\hat{x}_i = x_i / \|x_i\|_2, so x^i2=1\|\hat{x}_i\|_2 = 1. Let the complete dataset D={(Ii,yi)}\mathcal{D} = \{(I_i, y_i)\} contain KK distinct identities. This set is partitioned into a "forget" set Df\mathcal{D}_f corresponding to identities in Pu\mathcal{P}_u (to be forgotten) and a "retain" set Dr\mathcal{D}_r with the remaining identities.

The unlearning goal is formalized in retrieval terms:

  • (A) For any forget-set query II and any other face IpI_p of the same identity vs. a negative InI_n of a different identity, post-unlearning parameters ω\omega', retrieval flips: φω(I,Ip)<φω(I,In)\varphi_{\omega'}(I, I_p) < \varphi_{\omega'}(I, I_n), where φ(,)\varphi(\cdot, \cdot) denotes cosine similarity, i.e., φ(x^i,x^j)=x^ix^j\varphi(\hat{x}_i, \hat{x}_j) = \hat{x}_i^\top \hat{x}_j.
  • (B) For all retained identities, the original retrieval orderings are preserved: φω(I,Ip)>φω(I,In)\varphi_{\omega'}(I, I_p) > \varphi_{\omega'}(I, I_n).

Cluster Compactness Score (CS):

For a set P\mathcal{P} of identities (forget or retain), compactness is:

CS(P)=1PpP(1np(np1)ijclass px^ix^j)\mathrm{CS}(\mathcal{P}) = \frac{1}{|\mathcal{P}|}\sum_{p\in\mathcal{P}} \left(\frac{1}{n_p(n_p-1)} \sum_{i \neq j \in \text{class }p} \hat{x}_i^\top \hat{x}_j\right)

Lower CS indicates a more dispersed (less compact) identity cluster.

2. Dispersion Losses and Mathematical Formulation

To directly erase identity structure, the dispersion approach employs mini-batches solely from Df\mathcal{D}_f.

  • For data in a batch, for anchor ii define Pi={j[1..B]yj=yi,ji}P_i = \{j \in [1..B]\,|\, y_j = y_i,\,j \neq i\} and Npos={iPi>0}\mathcal{N}_{\mathrm{pos}} = \{i\,|\,|P_i| > 0\}.

Uniform Dispersion Loss:

Ldisp=1NposiNpos1PijPimax(0,m+x^ix^j)L_{\rm disp} = \frac{1}{|\mathcal{N}_{\rm pos}|} \sum_{i\in\mathcal{N}_{\rm pos}} \frac{1}{|P_i|} \sum_{j\in P_i} \max(0, m + \hat x_i^\top \hat x_j)

where m>0m > 0 is a margin hyperparameter. The hinge penalty activates whenever two same-identity embeddings are too close (cosine similarity above m-m), driving within-class pairs further apart.

Hard-Dispersion Loss:

Lhard_disp=1NposiNposmax[0,m+maxjPix^ix^j]L_{\rm hard\_disp} = \frac{1}{|\mathcal{N}_{\rm pos}|} \sum_{i\in\mathcal{N}_{\rm pos}} \max \left[ 0, m + \max_{j\in P_i} \hat x_i^\top \hat x_j \right]

This loss targets the most similar positive for each anchor, more aggressively disrupting the tightest intra-class links.

Combined Objective:

In main experiments, only the dispersion loss is optimized:

Ltotal=λdispLdisp+λretainLretainL_{\rm total} = \lambda_{\rm disp}\, L_{\rm disp} + \lambda_{\rm retain}\, L_{\rm retain}

with λdisp=1\lambda_{\rm disp}=1, λretain=0\lambda_{\rm retain}=0; typical settings: m=0.2m=0.2, learning rate 1×1041\times 10^{-4}, T=1000T=1000 steps.

3. Dispersion-Unlearning Algorithm

The operational steps are as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Input:
   f_ω         -- pretrained face encoder
   𝒟_f         -- forgetset images
   m           -- margin (e.g. 0.2)
   lr          -- learning rate (e.g. 1e4)
   B           -- batch size (32 or 160)
   T           -- number of unlearning steps (1000)
   use_hard?   -- use hard-dispersion or uniform

for t in 1..T:
   Sample batch {(I_i, y_i)} from ℒ_f
   x_i = f_ω(I_i)
   ħx_i = x_i / x_i
   For each batch, build P_i, 𝒩_pos
   If use_hard?:
      L = (1/|𝒩_pos|) _i max(0, m + max_jP_i ħx_iħx_j)
   Else:
      L = (1/|𝒩_pos|) _i (1/|P_i|) _jP_i max(0, m + ħx_iħx_j)
   ω  ω  lr · _ω L
Return f_{ω}

All gradients are restricted to Df\mathcal{D}_f; no training signal is backpropagated through Dr\mathcal{D}_r, so the geometry of retained identities remains stable.

4. Selective Impact and Cluster Structure Preservation

Dispersion-based unlearning preserves retrieval utility for non-forgotten identities via several mechanisms:

  • Selective Gradients: Sampling exclusively from Df\mathcal{D}_f ensures parameter updates do not affect the representations of retained identities.
  • Hyperspherical Repulsion: The margin mm in the hinge term bounds dispersion, preventing unwanted distortion of the embedding space or over-dispersal.
  • No Classifier Perturbation: Classifier weights or decision boundaries, common targets in classification-based unlearning approaches, remain unaltered. Only the local structures of forget-set clusters are modified.

This suggests that the method minimally impacts the discriminative power of the model for classes outside the forget set.

5. Experimental Results and Baseline Comparisons

Backbone and Training:

Architecture: IResNet-50, CosFace loss (s=64s=64, m=0.4m=0.4); pretraining on Glint360K with fine-tuning on CelebA; 512-D unit embeddings.

Dispersion Hyperparameters:

Learning rate: 1×1041\times 10^{-4}; margin: $0.2$; batch size: $32$ (CelebA), $160$ (VGGFace2); unlearning steps: $1000$; λretain=0\lambda_{\rm retain}=0.

Baselines:

Random Labeling, Gradient Ascent, Lipschitz Unlearning, Contrastive Unlearning, Boundary Shrink (adapted to CosFace).

Quantitative Performance:

  • On CelebA forget-set:
    • Original model: mAP =88.7=88.7, R@1 =98.17=98.17
    • Best baseline (Boundary Shrink): mAP 60.2\approx 60.2, R@1 96.5\approx 96.5
    • Dispersion Loss: mAP 17.8\approx 17.8, R@1 67.4\approx 67.4
    • Hard-Dispersion: mAP 16.0\approx 16.0, R@1 65.9\approx 65.9
  • Retention (CFP-FP/VGGFace2):
    • Original: $91.0/97.5$, $89.1/98.8$
    • Dispersion: $89.2/97.3$, $87.5/98.8$
  • Cluster Compactness (CelebA, forget set):
    • Original: $0.615$; Boundary Shrink: $0.255$; Dispersion: $0.089$; Hard-Dispersion: $0.086$
  • VGGFace2 (extended, forget set):
    • Dispersion: mAP 4.7\approx 4.7, R@1 79.0\approx 79.0; Hard-Dispersion: mAP 1.5\approx 1.5, R@1 51.6\approx 51.6

These results demonstrate markedly superior forgetting (larger drops in mAP/R@1 and cluster compactness) relative to all baselines, with essentially unperturbed performance for retained classes (Zakharov, 15 Dec 2025).

6. Geometric Rationale and Superiority over Existing Approaches

The effectiveness of hyperspherical dispersion arises from several geometric and algorithmic properties:

  1. Direct Geometric Manipulation: Classification-based unlearning corrupts classifier weights or labels, but underlying embedding clusters often remain compact, allowing successful retrieval via nearest neighbor search. Dispersion loss destroys local cluster cohesion directly at the embedding level.
  2. Margin-Based Repulsion: The hinge margin on pairwise cosine values guarantees an angular separation of at least arccos(m)\arccos(-m) between any two embeddings of the same forgotten identity, maximizing their dispersal within the available hyperspherical surface.
  3. Preservation of Embedding Structure: Embeddings maintain unit norm. The retention of the global hyperspherical structure means non-forgotten clusters remain unaffected, trading retrieval accuracy for forgotten identities in a tightly controlled manner.
  4. Algorithmic Simplicity and Robustness: The approach has a single, interpretable hyperparameter (mm). It requires no per-class retraining or additional regularization and delivers stable, reproducible forgetting outcomes with minimal risk of unintended side effects.

In summary, dispersion losses—uniform and hard—provide an explicit and geometrically principled method to dissolve compact identity clusters on the face-embedding hypersphere, producing dramatic and targeted degradation in retrievability for forgotten identities, while leaving the retrieval performance for all other faces almost unchanged. This surpasses the efficacy and selectivity of all previously proposed approximate unlearning strategies for embedding-based face retrieval (Zakharov, 15 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Dispersion-Based Unlearning Approach.