Class-Incremental Learning Framework

Updated 7 March 2026

Class-incremental learning is a framework that sequentially integrates new classes without requiring full retraining, addressing the stability-plasticity dilemma.
Techniques such as knowledge distillation, prototype classifiers, and replay buffers are employed to mitigate catastrophic forgetting.
Recent advancements involve adaptive distillation, decentralized learning, and data-free methods that enhance scalability and reduce annotation costs.

Class-incremental learning (CIL) frameworks enable deep models to sequentially accommodate new classes without access to the full historical dataset, while minimizing catastrophic forgetting of prior knowledge. CIL constitutes a central problem in continual learning, robotic adaptation, resource-frugal AI, and distributed learning, as it addresses the fundamental stability–plasticity dilemma in neural networks challenged by nonstationary data streams.

1. Fundamental Formulation and Taxonomy

CIL is formally defined by a sequence of states $\mathcal{S}_t$ ( $t=0,\ldots,T-1$ ), each corresponding to a disjoint batch of new classes $\mathcal{C}_t$ and their associated data $\mathcal{D}_t^{\text{new}}$ (Belouadah et al., 2020). At each increment, the learner is presented only with current-task data and optionally a limited-size buffer of prior exemplars, and updates its model $\mathcal{M}_t$ to classify over all seen classes $N_t=\sum_{i=0}^t P_i$ .

CIL frameworks can be distinguished by several axes:

Data memory: standard CIL allows a small buffer $\mathcal{K}$ of past-class examples; exemplar-free variants disallow any real data storage (Huang et al., 2024).
Increment protocol: tasks may be presented as single classes, batches, or in a few-shot regime (FSCIL) (Nema et al., 16 Jan 2025).
Update model: methods range from fixed-representation (backbone frozen) (Belouadah et al., 2020) to full end-to-end continual updating with distillation (Kang et al., 2022), meta-learning (Karim et al., 2021), or self-supervision (Ni et al., 2021 Chen et al., 2023).
Resource models: centralized versus federated/decentralized, online versus batch (Zhang et al., 2022).

2. Key Mechanisms for Mitigating Catastrophic Forgetting

Mitigation of catastrophic forgetting in CIL operates at three principal architectural loci: representation drift, classifier distortion, and memory replay.

a) Knowledge Distillation and Feature Regularization

Deep CIL frameworks commonly distill the outputs or features of an old model ("teacher") into the current ("student") using loss terms such as

$\mathcal{L}^{\text{distill}} = -\sum_{(x,y)\in\mathcal{K}} \Phi_{t-1}^j(x) \log \Phi_t^j(x),$

sometimes with margin or prototype constraints, to align logits or feature manifolds (Belouadah et al., 2020 Kang et al., 2022).

Recent advances employ adaptive, channel- or feature-wise importance estimation to consolidate "critical" representations and permit plasticity elsewhere (Kang et al., 2022).

b) Prototype and Class-Mean Mechanisms

Fixed or incrementally updated prototype classifiers compute class centroids in the frozen feature space, using nearest-class-mean assignment (NCM), SVMs, or SLDA classifiers, and largely eliminate classifier distortion (Belouadah et al., 2020 Liu et al., 2023 Huang et al., 2024).
Self-supervised or hybrid approaches use prototype clustering and embedding reservation to maintain feature space for future classes (Chen et al., 2023).

c) Replay and Exemplar Selection

Exemplar replay buffers are selected using strategies such as herding (to best approximate class means), Maximum Mean Discrepancy–based alignment (Liu et al., 20 Oct 2025), or clustering-based diversity maximization, with memory distributed to balance old/new class frequencies (Belouadah et al., 2020 Liu et al., 20 Oct 2025).
Data-free approaches synthesize "class impressions" (pseudo-samples) via batch-norm-regularized optimization in the feature space (Ayromlou et al., 2022).

3. Representative Frameworks and Methodological Innovations

A selection of recent frameworks and their principal characteristics includes:

Framework	Core Mechanism(s)	Exemplar Use
iCaRL, BiC, PODNet	Cross-entropy + replay + distillation	Buffer
AFC (Kang et al., 2022)	Feature-map importance for adaptive distillation	Buffer
IPC (Liu et al., 2023)	Self-supervised fixed encoder + prototype increment	None
G2B (Wu et al., 2024)	Two-branch, per-block modulation via side-branch CNN	Buffer
KCCIOL (Karim et al., 2021)	Meta-learning w/ knowledge consolidation & masks	None
IR (Huang et al., 2024)	Dataset augmentation + L2 space-maintenance loss	None
EndoCIL (Liu et al., 20 Oct 2025)	MMD replay, prior-balanced loss, gradient calibration	Buffer
DCID (Zhang et al., 2022)	Decentralized learning + multistage distillation	Per-site anchors

Key methodological advances include:

Self-supervised learning for CIL: Replacing label prediction with contrastive InfoNCE objectives decouples representation robustness from classifier drift, yielding high anti-forgetting and cross-phase generalization (Ni et al., 2021 Chen et al., 2023).
Active and frugal CIL: Selective labeling and compressed buffer strategies (Active CIL, CIFNet) drastically reduce annotation cost and memory footprint while sustaining competitive accuracy (Dopico-Castro et al., 14 Sep 2025 Bhattacharya et al., 4 Feb 2026).
Data-free incremental learning: Synthesis of anchor images using frozen models, followed by combined contrastive, normalized cross-entropy, and margin losses, achieves strong incremental adaptation without privacy-sensitive storage (Ayromlou et al., 2022).
Two-branch and modulation architectures: Side-branched convolutional modules can sparsify and stabilize feature propagation, consistently improving baseline CIL performance without introducing specialized losses (Wu et al., 2024).

4. Evaluation Protocols, Datasets, and Metrics

Experimental protocols follow standardized incremental splits of CIFAR-100 (10 classes per 10 increments), ImageNet-100/1000 (various splits), TinyImageNet, Omniglot, and task-specific datasets (e.g., MNIST, CUB200, UCF101 for action recognition) (Belouadah et al., 2020 Liu et al., 20 Oct 2025 Park et al., 2022). Incremental test accuracy is universally reported as mean top-k accuracy over all previously seen classes after each phase, with "last" and "average" accuracy metrics, and explicit computation of forgetting:

$F = \frac{1}{T-1} \sum_{t=1}^{T-1} \big[ \max_{i=0,...,t-1} a_i - a_t \big],$

where $a_t$ is the accuracy of old classes after learning state $t$ (Belouadah et al., 2020).

Alternative metrics include:

Backward transfer (BWT): Drop in previously acquired accuracy,
Retention: Performance only on initial (base) classes (Bhattacharya et al., 4 Feb 2026),
Annotation cost: Labels queried per increment (Bhattacharya et al., 4 Feb 2026),
Memory use and computational sustainability (Dopico-Castro et al., 14 Sep 2025),
Open-set AUROC for detection of unknown classes integrated in frameworks such as OpenIncrement (Xu et al., 2023).

5. Stability–Plasticity, Limitations, and Open Problems

CIL frameworks balance stability (retaining old knowledge) and plasticity (acquiring new classes). Catastrophic forgetting arises from both feature drift and classifier bias; replay and distillation address this at the representation and output levels, respectively. Asymptotic accuracy remains below joint retraining in most standard regimes, with the residual gap largely due to representation overlap and insufficient allocation of feature space for novel classes (Ni et al., 2021 Chen et al., 2023 Liu et al., 2023).

Salient limitations and open challenges:

Scalability to deep networks and very long sequence streams, especially with minimal memory (Karim et al., 2021 Liu et al., 20 Oct 2025).
Decentralized/heterogeneous data: Ensuring robust aggregation and knowledge transfer under non-IID or privacy-constrained data regimes (Zhang et al., 2022).
Theoretical guarantees: Most frameworks provide no formal bounds on forgetting or sample complexity.
Class order and semantic shift: Sensitivity to curriculum order and inter-task semantic drift persists (Ni et al., 2021 Kalla et al., 2024).
Buffer compression and selection: Efficient, diverse, and privacy-preserving replay remains under active study (Liu et al., 20 Oct 2025 Dopico-Castro et al., 14 Sep 2025).

6. Specialized Extensions and Domains

Few-shot CIL (FSCIL): Designed for extremely low-shot increments, requiring maximal separation of existing classes and feature-space "pre-allocation" (Nema et al., 16 Jan 2025).
Action/video recognition: Incorporates temporal-channel importance and orthogonality-regularized distillation to address temporal redundancy (Park et al., 2022).
Active and cost-aware CIL: Reduces real annotation demands via diversity- and uncertainty-based sampling (Bhattacharya et al., 4 Feb 2026).
Open-set CIL: Merges open-set recognition with CIL for practical nonstationary environments, using embedding compactness/separability constraints (Xu et al., 2023).
Endoscopic and medical imaging: Specialized frameworks (EndoCIL) tackle severe class imbalance and multi-source distribution shifts with domain-consistent replay and loss calibration (Liu et al., 20 Oct 2025).

7. Practical Considerations and Recommendations

The choice of CIL framework depends critically on memory constraints, annotation cost, computational sustainability, and the degree of permissible representation change. For bounded memory and frequent increments, simple bias-calibrated fine-tuning or buffered replay with prototype classifiers perform robustly (Belouadah et al., 2020). When memory is disallowed or privacy is paramount, self-supervised and data-free CIL offer marked forgetting resistance (Huang et al., 2024 Ayromlou et al., 2022 Liu et al., 2023).

Across regimes, integration of feature augmentation, dynamic loss calibration, self- or meta-supervision, and lightweight classifier expansion constitutes the state of the art in minimizing catastrophic forgetting and enhancing long-range adaptation. Comprehensive benchmarks and open-source codebases facilitate reproducibility and cross-method comparison (Belouadah et al., 2020 Zhang et al., 2022). Future advances are likely to focus on scaling to realistic distributed data, optimizing for annotation and energy cost, and formalizing stability–plasticity trade-offs in ever-larger streaming contexts.