Deep Feature & Classifier Forgetting in Neural Networks

Updated 15 December 2025

Deep feature-space forgetting describes the degradation and drift of latent representations, reducing linear separability of learned classes.
Shallow classifier-level forgetting occurs when classifier boundaries degrade after new data introduction, leading to misclassification despite stable features.
Integrating minimal replay and regularization techniques helps mitigate both forgetting phenomena, preserving performance in continual learning.

Deep feature-space forgetting and shallow classifier-level forgetting are distinct phenomena impacting the stability and reliability of deep neural networks under continual, incremental, or selective forgetting settings. Deep forgetting refers to representational drift or degradation within latent feature spaces, resulting in loss of linear separability and semantic content for previously learned inputs. Shallow forgetting pertains to the overwriting or misalignment of classifier boundaries trained over (potentially stable) features, causing old classes to be misclassified even if their internal representations remain informative. The explicit bifurcation between these two mechanisms is foundational in understanding catastrophic forgetting, empirical accuracy decay, and privacy guarantees in neural systems.

1. Conceptual Distinction: Feature-Space vs. Classifier-Level Forgetting

Catastrophic forgetting emerges in sequential/continual learning when model performance degrades on earlier tasks after incorporating new data or tasks. The phenomenon has two principal forms:

Deep feature-space forgetting: Representations $\phi(x)$ extracted by deep backbones (e.g., ResNet encoders, ViT blocks) drift—mean and covariance of feature embeddings for previously seen classes become misaligned or contract into low-variance subspaces. This impairs linear separability and erases semantic distinctions intrinsic to old data (Gu et al., 2023, Lanzillotta et al., 8 Dec 2025).
Shallow classifier-level forgetting: Even with stable features, decision boundaries instantiated by shallow linear layers or SVMs can be overwritten or crowded by newly added classes and noisy exemplars. Negative pool reallocation or imbalanced representation can render old classifier weights suboptimal, resulting in measurable accuracy degradation (Belouadah et al., 2018, Wu et al., 8 Feb 2025).

A critical insight from several works (Lanzillotta et al., 8 Dec 2025, Wu et al., 8 Feb 2025) is that deep feature-space forgetting can be neutralized by minimal replay (anchoring global feature means), whereas shallow forgetting often persists, requiring dedicated head regularization, replay, or subspace anchoring.

2. Mathematical Formalizations and Measurement Protocols

Quantification of both forms of forgetting relies on carefully constructed metrics:

Forgetting Mode	Metric/Definition	Key Reference
Deep feature-space	$F^{\mathrm{deep}}_{i\to j}(c)=\\|\mu_c(i)-\mu_c(j)\\|^2 + \mathrm{Tr}(\Sigma_c(i)-\Sigma_c(j))$ ; drop in linear probe accuracy; CKA similarity	(Lanzillotta et al., 8 Dec 2025, Hess et al., 2023)
Shallow classifier-level	$F^{\mathrm{clas}}_c(t) = \max_{0\leq i<t} \mathrm{Acc}_c(i) - \mathrm{Acc}_c(t)$ ; rank-deficient covariance, mean inflation; mAP_cls drop	(Belouadah et al., 2018, Wu et al., 8 Feb 2025)

Linear probing: Training a fresh linear classifier post-hoc on frozen features is a robust, standardized metric for feature-space forgetting (Hess et al., 2023, Gu et al., 2023).
Exclusion baseline (EXC): Comparing LP accuracy between a model trained on all tasks and one excluding a target task, with near-zero values after long training indicative of complete re-learnability from shared knowledge (Hess et al., 2023).
Classifier forgetting: Directly measured by drop in classification accuracy or AP when only the classifier head is altered, confirming that shallow drift dominates in multi-head or detector configurations (Wu et al., 8 Feb 2025, Belouadah et al., 2018).

3. Architectural and Algorithmic Approaches

Deep-shallow forgetting motivates diverse algorithmic interventions:

Frozen Deep Representations (DeeSIL): DeeSIL (Belouadah et al., 2018) freezes the deep feature extractor after initial training; only shallow, independent SVM classifiers are added for new classes. This guarantees $F_\mathrm{feat}=0$ , but residual $F_\mathrm{clas}$ arises from negative pool limitations. Adaptive negative selection (div, rand, ind) can mitigate shallow boundary drift.
Backward Feature Projection (BFP): BFP (Gu et al., 2023) regularizes feature drift by constraining the new map $h^t(x)$ to be a full-rank linear projection $P h^{t-1}(x)$ of the previous representations. This preserves all hyperplane separability in feature space, while new task information is allocated to the null space of $P$ .
Null-space Projection (NSGP, FFNB): NSGP (Wu et al., 8 Feb 2025) and FFNB (Sahbi et al., 2021) employ gradient projection into the null space of past-task input covariances, ensuring that backbone updates cannot alter old feature projections. Classifier boundaries are maintained with FDA (Fisher Discriminant Analysis) or prototype replay.
Feature Distillation and Generative Replay: Approaches such as Generative Feature Replay (Liu et al., 2020) use feature-level GANs to generate representative features for replay, combined with $L_2$ feature distillation to anchor deep representations.
Selective and Deep Unlearning: OPC (Jung et al., 10 Jul 2025) and Deep Unlearning (Kodge et al., 2023) go beyond classifier-level manipulation, enforcing one-point contraction on forget-set embeddings or SVD-based discriminatory subspace removal, respectively, thus performing "deep" erasure of forget-set information, resistant to recovery attacks.

4. Theoretical Guarantees, Impossibility Results, and Neural Collapse

Several works have placed these mechanisms on firm mathematical ground:

Linear Feature Extractors: For linear features, doubly projected gradient descent (DPGD) guarantees polynomial-time, no-catastrophic forgetting as each new feature can be orthogonalized against previous ones (Peng et al., 2022).
Nonlinear Feature Extractors: It is provably impossible to avoid deep forgetting in general for nonlinear feature extractors without sample replay or model expansion (Peng et al., 2022). Deep nonconvex representations entangle coordinates irreversibly with task allocation.
Neural Collapse Framework: The Neural Collapse regime (Lanzillotta et al., 8 Dec 2025) formalizes the behavior of features in continual learning under replay. Minimal buffer sizes can fully anchor deep representations ( $S_t$ subspace) and preserve ETF (Equiangular Tight Frame) structure, preventing deep (geometric) forgetting. Classifier-level (shallow) forgetting, however, closes only gradually with buffer size due to rank-deficient statistics; subspace regularization or mean/covariance correction is needed to restore reliable boundaries.

5. Empirical Findings Across Domains and Architectures

Experiments spanning vision benchmarks crystallize the following facts:

Freezing the feature extractor (DeeSIL, FFNB) eliminates deep forgetting but leaves shallow classifier boundaries vulnerable to crowding and negative memory limitations (Belouadah et al., 2018, Sahbi et al., 2021).
Prototype replay and null-space anchoring decisively reduce both deep and shallow forgetting in detectors, with classifier-level replay needed for high AP preservation (Wu et al., 8 Feb 2025).
Linear-probe and exclusion metrics demonstrate that features forget as catastrophically as output heads under continued task accrual, once normalized by learned information (Hess et al., 2023).
Generative feature replay achieves competitive accuracy and memory efficiency by synthesizing features for classifier training, eliminating the need for extensive image buffering while defending against classifier forgetting (Liu et al., 2020).
OPC and Deep Unlearning methods produce one-point contraction or discriminatory subspace erasure, guaranteeing resistance to recovery attacks, membership inference, and semantic leakage, evidencing the necessity of deep forgetting for privacy-sensitive unlearning (Jung et al., 10 Jul 2025, Kodge et al., 2023).
Best-in-class replay (DER++ + BFP) on Split-CIFAR-100 boosts final accuracy by up to 8.6 points and preserves CKA similarity ∼0.9 for old classes (Gu et al., 2023).

6. Design Recommendations and Practical Implications

Key recommendations and implications drawn from reviewed works:

Freeze or anchor deep representations to avoid feature drift; control classifier-memory size and employ diversity sampling to minimize shallow forgetting (Belouadah et al., 2018).
Use experience replay strategically: minimal buffer suffices for deep retention; buffer must cover sufficient class/task diversity and dimensionality to avoid shallow 'blindness' (Lanzillotta et al., 8 Dec 2025).
Regularize classifier heads to be compatible with true population statistics rather than just buffer subspaces; apply covariance inflation or mean correction as needed (Lanzillotta et al., 8 Dec 2025).
For privacy or compliance unlearning, deep contraction methods provide resilience against inference and reconstruction, overcoming the limitations of shallow predication-only forgetting (Jung et al., 10 Jul 2025, Kodge et al., 2023).
Feature-space forgetting limits cross-task knowledge accumulation; ensemble or functionally regularized methods retain substantially more knowledge for downstream tasks (Hess et al., 2023).
Evaluation protocols should measure both output and representation-level forgetting, as well as knowledge accumulation and recovery vulnerability (Hess et al., 2023, Jung et al., 10 Jul 2025).

This suggests that robust continual and incremental learning requires explicit attention to both deep and shallow mechanisms—anchoring latent geometry for stability, and actively managing classifier boundaries for true retention and privacy.