Deep Feature-Space Forgetting

Updated 8 January 2026

Deep feature-space forgetting is the degradation of learned neural representations as a network updates, resulting in entangled classes and misclassifications.
It is quantified using metrics like linear probe drop, Wasserstein distance, and eigenvalue overlap to assess hidden representation shifts.
Mitigation techniques such as null-space projection and backward feature mapping help preserve feature separability and ensure effective unlearning.

Deep feature-space forgetting is the phenomenon wherein the internal representations or embedding spaces of deep neural networks, built to encode previously learned knowledge, drift or collapse as models are updated to accommodate new tasks or data distributions. Unlike shallow forgetting, which pertains to output-layer or classifier-level performance metrics, deep feature-space forgetting directly undermines the geometric or statistical structure of learned features—often rendering previously separable classes entangled and impairing generalization, transfer, and privacy guarantees. The subject spans both continual learning, model fine-tuning, and machine unlearning, and has catalyzed a range of mitigation and analysis methodologies.

1. Formal Definitions and Measurement

Feature-space forgetting is formally characterized by changes in the internal feature representations $\psi_{\ell}^t(x)\in\mathbb{R}^{d_\ell}$ after sequential training on tasks $1,\ldots,t$ ; forgetting occurs when, for an input $x$ from an earlier task $r<t$ , the norm $\|\psi_\ell^t(x)-\psi_\ell^{t-1}(x)\|$ becomes sufficiently large to cause misclassification (Sahbi et al., 2021). This is distinguished from shallow forgetting, measured at model outputs, by evaluating the drop in linear probe accuracy $A^\star_{ij}$ on frozen features versus the actual classifier accuracy $A_{ij}$ (Lanzillotta et al., 8 Dec 2025).

Empirical measurement leverages:

Linear Probe Drop: $\Delta_{\rm LP} = \mathcal{A}_{\rm LP}(f_\theta;\mathcal{D}_{\rm test}) - \mathcal{A}_{\rm LP}(f_{\hat\theta};\mathcal{D}_{\rm test})$ (Huang et al., 30 May 2025).
Feature Distribution Distances: Wasserstein distance, MMD, or dataset-level optimal transport (OTDD) between feature sets pre- and post-update (Huang et al., 30 May 2025, Choi et al., 2024).
Forgetting Metrics for Generative Models: Per-task drift in data or feature distributions, e.g., $FS_t$ (forgetfulness score) and compensated variants (Lao et al., 2020).
Dimensional Alignment: Frobenius-norm overlap of eigenspaces, measuring how well the forget set is aligned to the retained-set's principal components (Seo et al., 2024).

These metrics reveal that representation-level forgetting can be as catastrophic, in relative terms, as output-level forgetting (Hess et al., 2023), and is only superficially masked when using non-normalized metrics.

2. Underlying Mechanisms and Theoretical Insights

Standard SGD or fine-tuning in deep networks tends to update all weights indiscriminately, which shifts feature directions used for earlier tasks (Sahbi et al., 2021). In replay-based continual learning, even minimal buffers asymptotically guarantee retention of feature linear separability, yet shallow forgetting persists due to statistical artifacts such as strong collapse and covariance rank deficiency (Lanzillotta et al., 8 Dec 2025). The phenomenon is further elucidated via Neural Collapse extensions, showing that active feature subspace $S$ is preserved under replay, but information outside $S$ (i.e., $S^\perp$ ) undergoes rapid decay and drift.

Mitigation hinges on decomposing weight updates:

Null-space projection: Restrict $\Delta W$ at each layer to the orthogonal complement of previously used principal feature directions, thereby freezing old representations (Sahbi et al., 2021).
Backward Feature Projection: Enforce that new-task features change only up to a learnable linear transformation of the old features, preserving separability while allowing plasticity (Gu et al., 2023).
Gating adapters: Binary masks selectively freeze inactive features or units during task updates, preserving reusable representations (Zhang et al., 2022).

In unlearning and privacy contexts, “deep forgetting” demands the contraction or disconnection of forget-set embeddings—typically to a single point in feature space—such that no post-hoc attack (e.g., inversion or head recovery) can reconstruct their semantics (Jung et al., 10 Jul 2025).

3. Architectures and Algorithms for Forgetting-Free Learning

Advanced architectures address feature-space forgetting with dedicated mechanisms:

Method	Strategy	Forgetting Mitigation
FFNB (Sahbi et al., 2021)	Null-space projection	Restricts updates to new-task nullspace
BFP (Gu et al., 2023)	Learnable linear mapping	Preserves linear separability; allows plasticity
ER-FSL (Lin, 2024)	Feature subspace partitioning	Decouples learning and replay subspaces

FFNB: Null-Space Constraint and End-to-End Training

In FFNB, fully-connected layers are partitioned into bands per task, and the trainable band for each new task is projected strictly onto the null-space of prior tasks' feature matrices. Classifier updates utilize incremental Fisher Discriminant Analysis, and all updates during fine-tuning maintain the null-space constraint to ensure frozen representations (Sahbi et al., 2021).

BFP: Backward Feature Mapping

BFP decouples the rigidity of strict feature matching by learning a task-specific $A$ such that $A h_t(x) \approx h_{t-1}(x)$ for all $x$ from earlier tasks. This method preserves the principal directions required for old tasks and supports the emergence of new directions for novel classes (Gu et al., 2023).

ER-FSL: Partitioned Feature Subspaces

ER-FSL allocates independent feature subspaces per task, using heuristic selection and reuse based on classifier weight variance. Replay reinforces old knowledge in the union of all subspaces, empirically mitigating catastrophic collapse of class supports (Lin, 2024).

4. Feature-Space Regularization and Replay Strategies

For generative models, catastrophic feature drift can be reduced by direct regularization in feature space. FoCL implements a feature-space divergence penalty between the current generator and previous-task generators, using either squared Euclidean or Wasserstein metrics on encoded features (Lao et al., 2020). Proxy-FDA extends the principle for fine-tuning large vision models, explicitly matching nearest-neighbor graph structures in feature space and synthesizing informative proxies (Huang et al., 30 May 2025). Data-free replay techniques, such as NIFF, forge synthetic instance-level features for old classes based only on stored per-class statistics, enabling highly memory-efficient, privacy-oriented continual object detection (Guirguis et al., 2023).

Empirical studies confirm that feature-space regularization—particularly when structure-aware (graph-based)—mitigates concept forgetting (e.g., $\Delta_{\rm LP} \approx +1.6\%$ for Proxy-FDA vs. $-4.4\%$ for naive fine-tuning) (Huang et al., 30 May 2025). Flat forgetfulness curves in $FS_t$ and compensated variants are markers of resistance to feature drifting (Lao et al., 2020).

5. Deep Feature-Space Forgetting in Machine Unlearning

Machine unlearning requires not only output-level erasure but semantic collapse of the forget set in feature space. Principal approaches include:

Dimensional Alignment (DA): Measures the projection of forget-set covariance onto retained-set eigenspaces; DA regularization pushes forget-set features into the high-alignment region of retained-set manifold (Seo et al., 2024).
One-Point-Contraction (OPC): Drives all forget-set embeddings into a small ball near the origin, fundamentally destroying discriminative power and resilience to membership-inference or recovery attacks (Jung et al., 10 Jul 2025).
SVD-Based Deep Unlearning: Orthogonally projects out class-discriminatory directions via per-layer SVD and single-shot weight updates, achieving unlearning efficacy with minimal data and compute overhead (Kodge et al., 2023).
Distribution-Level Feature Distancing (DLFD): Employs optimal transport to maximize the divergence between feature distributions of retain and forget sets, preserving label correlations while achieving superior NoMUS (combined utility + privacy) scores versus instance-level adversarial or gradient-based methods (Choi et al., 2024).

Invariance via adversarial forgetting is achieved by masking out factors correlated with undesired nuisance or bias variables, using an adversarial discriminator to enforce information bottlenecks in feature flow (Jaiswal et al., 2019).

6. Empirical Evidence and Benchmark Findings

Across standard benchmarks (CIFAR-100, TinyImageNet, ImageNet, COCO, VOC) and generative/representation learning lifecycles:

FFNB maintained $\approx84\%$ accuracy after 8 tasks on SBU Skeleton (vs. $12\%$ for incremental baseline) and $\approx67\%$ on FPHA after 45 tasks (Sahbi et al., 2021).
BFP enhanced DER++ FAA by up to $6.8\%$ on Split-CIFAR10 and $8.6\%$ on Split-CIFAR100, with corresponding $>10\%$ drops in forgetting (Gu et al., 2023).
Proxy-FDA yielded positive $\Delta_{\rm LP}$ on fine-tuned CLIP models, outperforming both point-wise L2 regularization and naive fine-tuning (Huang et al., 30 May 2025).
OPC achieved near-retrained forgetting efficacy with high robustness to gradient inversion and performance recovery on CIFAR-10 and TinyImageNet; CKA similarity between pre- and post-unlearning embeddings dropped to zero for forget sets (Jung et al., 10 Jul 2025).
DLFD surpassed previous unlearning baselines in NoMUS while keeping accuracy drops under $2\%$ , demonstrating the benefit of distribution-level forgetting (Choi et al., 2024).
Adapters in continual representation learning reliably closed half the gap to multi-task representation quality, with $P_{rep}$ (forward-transfer metric) improved by $2\%$ to $3\%$ (Zhang et al., 2022).
Auto DeepVis localized feature drift to specific blocks, and critical freezing recovered up to $8$ BLEU-1 points on past tasks compared to unconstrained fine-tuning for captioning models (Nguyen et al., 2020).

7. Controversies, Open Problems, and Research Directions

The distinction and interplay between shallow and deep forgetting remain areas of active inquiry (Lanzillotta et al., 8 Dec 2025). Statistical artifacts in buffer-based replay—such as minor collapses inflating class means—can confound classifier-level optimization even when feature separability remains high. There is also accumulating evidence that mere output-level unlearning is insufficient for privacy and compliance, as internal representations often retain extractable class-discriminatory information unless deep contraction or dimensional alignment is enforced (Jung et al., 10 Jul 2025, Seo et al., 2024).

Recommendations for future work include exploration of feature adaptation as lightweight, robust alternatives to parameter fine-tuning, extending unlearning frameworks to unsupervised, segmentation, and foundation model contexts, and refining evaluation metrics beyond accuracy and MIA rates to feature consistency and semantic recoverability (Wang et al., 22 Oct 2025, Kodge et al., 2023).

References

"FFNB: Forgetting-Free Neural Blocks for Deep Continual Visual Learning" (Sahbi et al., 2021)
"NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging" (Guirguis et al., 2023)
"Asymptotic analysis of shallow and deep forgetting in replay with Neural Collapse" (Lanzillotta et al., 8 Dec 2025)
"FoCL: Feature-Oriented Continual Learning for Generative Models" (Lao et al., 2020)
"Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting" (Huang et al., 30 May 2025)
"ER-FSL: Experience Replay with Feature Subspace Learning for Online Continual Learning" (Lin, 2024)
"Knowledge Accumulation in Continually Learned Representations and the Issue of Feature Forgetting" (Hess et al., 2023)
"Deep Unlearning: Fast and Efficient Gradient-free Approach to Class Forgetting" (Kodge et al., 2023)
"Dissecting Catastrophic Forgetting in Continual Learning by Deep Visualization" (Nguyen et al., 2020)
"Preserving Linear Separability in Continual Learning by Backward Feature Projection" (Gu et al., 2023)
"Revisiting Machine Unlearning with Dimensional Alignment" (Seo et al., 2024)
"Feature Space Adaptation for Robust Model Fine-Tuning" (Wang et al., 22 Oct 2025)
"Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting" (Choi et al., 2024)
"Feature Forgetting in Continual Representation Learning" (Zhang et al., 2022)
"Invariant Representations through Adversarial Forgetting" (Jaiswal et al., 2019)
"OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting" (Jung et al., 10 Jul 2025)