Geometric Disentanglement Unlearning (GU)

Updated 28 November 2025

Geometric-disentanglement unlearning (GU) is a framework that employs orthogonal projection techniques to selectively forget targeted data influences while preserving valuable model performance.
It leverages first-order Taylor approximations and orthogonality in parameter and representation spaces to disentangle and minimize collateral impacts during unlearning.
GU has been applied to scenarios like language modeling and backdoor removal, achieving provable guarantees on optimal trade-offs between forgetting and retention.

Geometric-disentanglement unlearning (GU) is a principled framework for selectively removing the influence of specified subsets of training data or behaviors from machine learning models while minimizing unintended degradation of retained knowledge. By leveraging geometric properties in either parameter or representation space, GU systematically disentangles updates affecting the "forget" set from those relevant to the "retain" set, achieving provably optimal or near-optimal unlearning along rigorous guarantees at both classification and representation levels. GU encompasses a body of techniques unified by first-order orthogonality projections and has been instantiated in multiple settings, including language modeling, representation learning, and backdoor removal (Zhou et al., 21 Nov 2025, Le et al., 24 Nov 2025, Abdelraheem et al., 16 Oct 2025).

1. Background and Motivation

The machine unlearning problem requires effectively "forgetting" information attributed to a subset of data (the forget set $D_f$ ) from a deployed model $\pi_\theta$ , while preserving performance on the remaining data (retain set $D_r$ ). Traditional gradient ascent on forget samples is prone to harmful collateral effects, as the induced parameter updates are often entangled with directions vital for generalization on $D_r$ . This entanglement introduces an inherent tradeoff between effective forgetting and retention fidelity (Zhou et al., 21 Nov 2025). Earlier approaches including fine-tuning and bi-objective loss optimization lack formal analysis of these tradeoffs and may perturb retained knowledge due to the geometric overlap of forget and retain directions in parameter space (Le et al., 24 Nov 2025, Abdelraheem et al., 16 Oct 2025).

Orthogonalization and projection-based updates provide a geometric lens: ideal unlearning directions are those orthogonal to all retained-task gradients, so that first-order retain-set loss is unchanged, and only the forget subspace is affected (Zhou et al., 21 Nov 2025, Abdelraheem et al., 16 Oct 2025). This motivates the broad class of geometric-disentanglement unlearning methods.

2. Theoretical Foundations and First-order Analysis

GU is grounded in a first-order Taylor approximation of loss changes under small parameter updates. For parameters $\theta\in\mathbb{R}^p$ and losses $L_r(\theta)$ (retain) and $L_f(\theta)$ (forget), the change in retain loss after an update $\delta\theta$ is

$\Delta^{(1)}L_r \approx \nabla_\theta L_r(\theta)^\top \delta\theta.$

To preserve retention ( $\Delta^{(1)}L_r=0$ ), $\delta\theta$ must be orthogonal to the span of per-example retain gradients: $T_r := \mathrm{span}\left\{ \nabla_\theta \ell_r(x_i; \theta) : x_i\in D_r \right\}.$ The orthogonal projector onto $T_r$ is

$P_\parallel = G(G^\top G)^{-1}G^\top, \quad P_\perp = I - P_\parallel,$

where $G$ stacks the per-example retain gradients. Any forget gradient $g_f$ is decomposed as $g_f = P_\parallel g_f + P_\perp g_f$ , with $P_\perp g_f$ representing the disentangled, retain-invariant direction (Zhou et al., 21 Nov 2025). The optimal GU update under a trust-region constraint ( $\|\delta\theta\| \leq \epsilon$ ) is

$\delta\theta^* = \epsilon \frac{P_\perp \nabla L_f(\theta)}{\|P_\perp \nabla L_f(\theta)\|_2},$

provably maximizing forget-set loss decrease while leaving retain loss unchanged to first order (Zhou et al., 21 Nov 2025).

3. Algorithms and Practical Instantiations

GU is a "plug-in" framework and can be attached to gradient-based unlearning objectives. The core implementation steps are:

Compute retain gradients on a small batch to estimate $T_r$ .
Form an orthonormal basis (e.g., via Gram-Schmidt) for $T_r$ .
Compute the (bi-objective) base gradient $g_{tot}$ , typically $\gamma\nabla L_f + \alpha\nabla L_r$ .
Project the forget component $g_f$ onto $T_r^\perp$ .
Apply $\theta \leftarrow \theta - \eta\, (P_\perp g_f/\|P_\perp g_f\|)$ , optionally line-searching the step size.

In high-dimensional models (e.g., LLMs or vision transformers), projections are often restricted to selected layers. Pragmatically, the basis dimension $k$ (rank of $T_r$ ) is set small (e.g., $k=16$ –$64$) and the extra computational cost remains modest (Zhou et al., 21 Nov 2025).

An alternative parameter-space approach, TBAR (Triggered Backdoor Attenuation via Removal), applies GU to the context of backdoor forgetting by treating malicious behaviors as low-dimensional task vectors $\tau_t$ and projecting onto the complement of their spans (Abdelraheem et al., 16 Oct 2025). Formally, a backdoored model has parameters

$\theta_b \approx \theta_{pre} + \tau_c + \tau_t,$

and unlearning is performed by subtracting $\alpha\tau_t$ , with $\alpha$ validated to balance removal efficacy and clean accuracy.

At the representation level, POUR (Provably Optimal Unlearning of Representations) realizes GU by projecting penultimate-layer features onto the complement of the forgotten class-equivalent direction in the simplex equiangular tight frame (ETF), preserving geometric optimality for the retained classes under the neural collapse phenomenon (Le et al., 24 Nov 2025).

4. GU in Representation Space: Neural Collapse and the POUR Framework

Under neural collapse (NC) at terminal training, penultimate-layer features and the final-layer classifier weights form a simplex ETF structure. Let $v_i$ denote the class- $i$ mean in $\mathbb{R}^{C-1}$ with ETF properties: $\|v_i\|=1,\quad v_i^\top v_j=-\frac{1}{C-1}\ (i\neq j),\quad \sum_{i=1}^C v_i=0.$ For unlearning class $u$ , the closed-form optimal GU operator is the orthogonal projector

$P = I - \frac{v_u v_u^\top}{\|v_u\|^2},$

which collapses the class- $u$ direction while maintaining a perfect ETF for the remaining $C-1$ classes in a lower-dimensional subspace (Le et al., 24 Nov 2025).

At the classifier level, one may

project the feature extractor output $z \mapsto Pz$ ("PU-Z"), or
project the final-layer weights $W \mapsto PW$ ("PU-W"), both eliminating recoverable information about $u$ with a single operation.

To propagate unlearning deeper into feature representations, POUR-D applies projection-guided distillation: a student network is trained to match projected (collapsed) features on the forget set, enforcing ETF structure preservation by $L_2$ alignment (Le et al., 24 Nov 2025).

5. Experimental Evaluation and Metrics

GU has been benchmarked across LLM toy datasets (TOFU, MUSE), backdoor vision models, and image datasets (CIFAR-10/100, PathMNIST). Metrics in use include:

Extraction Strength (ES): quantifies membership inference or risk association on forget/retain splits (lower ES for $D_f$ , higher for $D_r$ is desirable).
Clean Accuracy (CA): accuracy on retained/clean data.
Attack Success Rate (ASR): for backdoor unlearning, the triggering success post-unlearning.
Adaptive Unlearning Score (AUS): for classification-level selectivity.
Membership Inference Attack Rate (rMIA): privacy leakage at the representation level.
Representation Unlearning Score (RUS): combines CKA similarity between model representations on forget and retain sets, harmonizing erasure of forgotten information with retention of clean knowledge.

Empirically, GU consistently shifts the Pareto frontier, reducing ES/ASR on the forget set without degradation—and in instances improvement—of CA/ES on the retain set. POUR achieves instantaneous and perfect class forgetting at the representation level with negligible collateral impact under neural collapse, verified by t-SNE plots and CKA statistics (Le et al., 24 Nov 2025, Zhou et al., 21 Nov 2025, Abdelraheem et al., 16 Oct 2025).

6. Limitations, Applicability, and Future Directions

GU methods are most effective for large models and scenarios where the entanglement between forget and retain directions is significant. The primary limitations are:

Local linearity: The first-order orthogonality guarantee assumes small-step updates; large-step nonlinearity may violate theoretical retain-invariance (Zhou et al., 21 Nov 2025).
Approximate retain subspaces: Practical estimation of $T_r$ via mini-batching or restricted layers may incompletely cover the true retain subspace.
Computational overhead: Basis maintenance and projections incur additional, but typically moderate, costs.

A plausible implication is that GU may not completely generalize when the geometric structures (e.g., ETF) are poorly realized or when higher-order interactions dominate. Future work may incorporate second-order corrections, adaptive rank determination for $T_r$ , or certified global guarantees (Zhou et al., 21 Nov 2025). Extensions to other unlearning targets, such as spurious correlations or privacy-related concepts, are enabled by the generality of the underlying geometric principle (Abdelraheem et al., 16 Oct 2025).

7. Comparative Table of Representative GU Methods

Method	Geometry Space	Core Operation
GU (Zhou et al., 21 Nov 2025)	Parameter	Orthogonal projection of forget gradient onto $T_r^\perp$
TBAR (Abdelraheem et al., 16 Oct 2025)	Parameter	Linear task-vector subtraction, projection for backdoor removal
POUR-P/POUR-D (Le et al., 24 Nov 2025)	Representation	Simplex ETF projection (POUR-P), plus distillation for deep unlearning (POUR-D)

These approaches represent the state-of-the-art for first-order, provably "retain-safe" selective forgetting in neural networks, applicable to privacy, safety, and behavior-narrowing scenarios.

PDF Markdown Chat (Pro)

References (3)

Geometric-Disentangelment Unlearning (2025)

POUR: A Provably Optimal Method for Unlearning Representations via Neural Collapse (2025)

Backdoor Unlearning by Linear Task Decomposition (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Geometric-disentanglement Unlearning (GU).