Papers
Topics
Authors
Recent
2000 character limit reached

Geometric Disentanglement Unlearning (GU)

Updated 28 November 2025
  • Geometric-disentanglement unlearning (GU) is a framework that employs orthogonal projection techniques to selectively forget targeted data influences while preserving valuable model performance.
  • It leverages first-order Taylor approximations and orthogonality in parameter and representation spaces to disentangle and minimize collateral impacts during unlearning.
  • GU has been applied to scenarios like language modeling and backdoor removal, achieving provable guarantees on optimal trade-offs between forgetting and retention.

Geometric-disentanglement unlearning (GU) is a principled framework for selectively removing the influence of specified subsets of training data or behaviors from machine learning models while minimizing unintended degradation of retained knowledge. By leveraging geometric properties in either parameter or representation space, GU systematically disentangles updates affecting the "forget" set from those relevant to the "retain" set, achieving provably optimal or near-optimal unlearning along rigorous guarantees at both classification and representation levels. GU encompasses a body of techniques unified by first-order orthogonality projections and has been instantiated in multiple settings, including language modeling, representation learning, and backdoor removal (Zhou et al., 21 Nov 2025, Le et al., 24 Nov 2025, Abdelraheem et al., 16 Oct 2025).

1. Background and Motivation

The machine unlearning problem requires effectively "forgetting" information attributed to a subset of data (the forget set DfD_f) from a deployed model πθ\pi_\theta, while preserving performance on the remaining data (retain set DrD_r). Traditional gradient ascent on forget samples is prone to harmful collateral effects, as the induced parameter updates are often entangled with directions vital for generalization on DrD_r. This entanglement introduces an inherent tradeoff between effective forgetting and retention fidelity (Zhou et al., 21 Nov 2025). Earlier approaches including fine-tuning and bi-objective loss optimization lack formal analysis of these tradeoffs and may perturb retained knowledge due to the geometric overlap of forget and retain directions in parameter space (Le et al., 24 Nov 2025, Abdelraheem et al., 16 Oct 2025).

Orthogonalization and projection-based updates provide a geometric lens: ideal unlearning directions are those orthogonal to all retained-task gradients, so that first-order retain-set loss is unchanged, and only the forget subspace is affected (Zhou et al., 21 Nov 2025, Abdelraheem et al., 16 Oct 2025). This motivates the broad class of geometric-disentanglement unlearning methods.

2. Theoretical Foundations and First-order Analysis

GU is grounded in a first-order Taylor approximation of loss changes under small parameter updates. For parameters θRp\theta\in\mathbb{R}^p and losses Lr(θ)L_r(\theta) (retain) and Lf(θ)L_f(\theta) (forget), the change in retain loss after an update δθ\delta\theta is

Δ(1)LrθLr(θ)δθ.\Delta^{(1)}L_r \approx \nabla_\theta L_r(\theta)^\top \delta\theta.

To preserve retention (Δ(1)Lr=0\Delta^{(1)}L_r=0), δθ\delta\theta must be orthogonal to the span of per-example retain gradients: Tr:=span{θr(xi;θ):xiDr}.T_r := \mathrm{span}\left\{ \nabla_\theta \ell_r(x_i; \theta) : x_i\in D_r \right\}. The orthogonal projector onto TrT_r is

P=G(GG)1G,P=IP,P_\parallel = G(G^\top G)^{-1}G^\top, \quad P_\perp = I - P_\parallel,

where GG stacks the per-example retain gradients. Any forget gradient gfg_f is decomposed as gf=Pgf+Pgfg_f = P_\parallel g_f + P_\perp g_f, with PgfP_\perp g_f representing the disentangled, retain-invariant direction (Zhou et al., 21 Nov 2025). The optimal GU update under a trust-region constraint (δθϵ\|\delta\theta\| \leq \epsilon) is

δθ=ϵPLf(θ)PLf(θ)2,\delta\theta^* = \epsilon \frac{P_\perp \nabla L_f(\theta)}{\|P_\perp \nabla L_f(\theta)\|_2},

provably maximizing forget-set loss decrease while leaving retain loss unchanged to first order (Zhou et al., 21 Nov 2025).

3. Algorithms and Practical Instantiations

GU is a "plug-in" framework and can be attached to gradient-based unlearning objectives. The core implementation steps are:

  1. Compute retain gradients on a small batch to estimate TrT_r.
  2. Form an orthonormal basis (e.g., via Gram-Schmidt) for TrT_r.
  3. Compute the (bi-objective) base gradient gtotg_{tot}, typically γLf+αLr\gamma\nabla L_f + \alpha\nabla L_r.
  4. Project the forget component gfg_f onto TrT_r^\perp.
  5. Apply θθη(Pgf/Pgf)\theta \leftarrow \theta - \eta\, (P_\perp g_f/\|P_\perp g_f\|), optionally line-searching the step size.

In high-dimensional models (e.g., LLMs or vision transformers), projections are often restricted to selected layers. Pragmatically, the basis dimension kk (rank of TrT_r) is set small (e.g., k=16k=16–$64$) and the extra computational cost remains modest (Zhou et al., 21 Nov 2025).

An alternative parameter-space approach, TBAR (Triggered Backdoor Attenuation via Removal), applies GU to the context of backdoor forgetting by treating malicious behaviors as low-dimensional task vectors τt\tau_t and projecting onto the complement of their spans (Abdelraheem et al., 16 Oct 2025). Formally, a backdoored model has parameters

θbθpre+τc+τt,\theta_b \approx \theta_{pre} + \tau_c + \tau_t,

and unlearning is performed by subtracting ατt\alpha\tau_t, with α\alpha validated to balance removal efficacy and clean accuracy.

At the representation level, POUR (Provably Optimal Unlearning of Representations) realizes GU by projecting penultimate-layer features onto the complement of the forgotten class-equivalent direction in the simplex equiangular tight frame (ETF), preserving geometric optimality for the retained classes under the neural collapse phenomenon (Le et al., 24 Nov 2025).

4. GU in Representation Space: Neural Collapse and the POUR Framework

Under neural collapse (NC) at terminal training, penultimate-layer features and the final-layer classifier weights form a simplex ETF structure. Let viv_i denote the class-ii mean in RC1\mathbb{R}^{C-1} with ETF properties: vi=1,vivj=1C1 (ij),i=1Cvi=0.\|v_i\|=1,\quad v_i^\top v_j=-\frac{1}{C-1}\ (i\neq j),\quad \sum_{i=1}^C v_i=0. For unlearning class uu, the closed-form optimal GU operator is the orthogonal projector

P=Ivuvuvu2,P = I - \frac{v_u v_u^\top}{\|v_u\|^2},

which collapses the class-uu direction while maintaining a perfect ETF for the remaining C1C-1 classes in a lower-dimensional subspace (Le et al., 24 Nov 2025).

At the classifier level, one may

  • project the feature extractor output zPzz \mapsto Pz ("PU-Z"), or
  • project the final-layer weights WPWW \mapsto PW ("PU-W"), both eliminating recoverable information about uu with a single operation.

To propagate unlearning deeper into feature representations, POUR-D applies projection-guided distillation: a student network is trained to match projected (collapsed) features on the forget set, enforcing ETF structure preservation by L2L_2 alignment (Le et al., 24 Nov 2025).

5. Experimental Evaluation and Metrics

GU has been benchmarked across LLM toy datasets (TOFU, MUSE), backdoor vision models, and image datasets (CIFAR-10/100, PathMNIST). Metrics in use include:

  • Extraction Strength (ES): quantifies membership inference or risk association on forget/retain splits (lower ES for DfD_f, higher for DrD_r is desirable).
  • Clean Accuracy (CA): accuracy on retained/clean data.
  • Attack Success Rate (ASR): for backdoor unlearning, the triggering success post-unlearning.
  • Adaptive Unlearning Score (AUS): for classification-level selectivity.
  • Membership Inference Attack Rate (rMIA): privacy leakage at the representation level.
  • Representation Unlearning Score (RUS): combines CKA similarity between model representations on forget and retain sets, harmonizing erasure of forgotten information with retention of clean knowledge.

Empirically, GU consistently shifts the Pareto frontier, reducing ES/ASR on the forget set without degradation—and in instances improvement—of CA/ES on the retain set. POUR achieves instantaneous and perfect class forgetting at the representation level with negligible collateral impact under neural collapse, verified by t-SNE plots and CKA statistics (Le et al., 24 Nov 2025, Zhou et al., 21 Nov 2025, Abdelraheem et al., 16 Oct 2025).

6. Limitations, Applicability, and Future Directions

GU methods are most effective for large models and scenarios where the entanglement between forget and retain directions is significant. The primary limitations are:

  • Local linearity: The first-order orthogonality guarantee assumes small-step updates; large-step nonlinearity may violate theoretical retain-invariance (Zhou et al., 21 Nov 2025).
  • Approximate retain subspaces: Practical estimation of TrT_r via mini-batching or restricted layers may incompletely cover the true retain subspace.
  • Computational overhead: Basis maintenance and projections incur additional, but typically moderate, costs.

A plausible implication is that GU may not completely generalize when the geometric structures (e.g., ETF) are poorly realized or when higher-order interactions dominate. Future work may incorporate second-order corrections, adaptive rank determination for TrT_r, or certified global guarantees (Zhou et al., 21 Nov 2025). Extensions to other unlearning targets, such as spurious correlations or privacy-related concepts, are enabled by the generality of the underlying geometric principle (Abdelraheem et al., 16 Oct 2025).

7. Comparative Table of Representative GU Methods

Method Geometry Space Core Operation
GU (Zhou et al., 21 Nov 2025) Parameter Orthogonal projection of forget gradient onto TrT_r^\perp
TBAR (Abdelraheem et al., 16 Oct 2025) Parameter Linear task-vector subtraction, projection for backdoor removal
POUR-P/POUR-D (Le et al., 24 Nov 2025) Representation Simplex ETF projection (POUR-P), plus distillation for deep unlearning (POUR-D)

These approaches represent the state-of-the-art for first-order, provably "retain-safe" selective forgetting in neural networks, applicable to privacy, safety, and behavior-narrowing scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Geometric-disentanglement Unlearning (GU).