Papers
Topics
Authors
Recent
Search
2000 character limit reached

Class-Incremental Learning Framework

Updated 7 March 2026
  • Class-incremental learning is a framework that sequentially integrates new classes without requiring full retraining, addressing the stability-plasticity dilemma.
  • Techniques such as knowledge distillation, prototype classifiers, and replay buffers are employed to mitigate catastrophic forgetting.
  • Recent advancements involve adaptive distillation, decentralized learning, and data-free methods that enhance scalability and reduce annotation costs.

Class-incremental learning (CIL) frameworks enable deep models to sequentially accommodate new classes without access to the full historical dataset, while minimizing catastrophic forgetting of prior knowledge. CIL constitutes a central problem in continual learning, robotic adaptation, resource-frugal AI, and distributed learning, as it addresses the fundamental stability–plasticity dilemma in neural networks challenged by nonstationary data streams.

1. Fundamental Formulation and Taxonomy

CIL is formally defined by a sequence of states St\mathcal{S}_t (t=0,,T1t=0,\ldots,T-1), each corresponding to a disjoint batch of new classes Ct\mathcal{C}_t and their associated data Dtnew\mathcal{D}_t^{\text{new}} (Belouadah et al., 2020). At each increment, the learner is presented only with current-task data and optionally a limited-size buffer of prior exemplars, and updates its model Mt\mathcal{M}_t to classify over all seen classes Nt=i=0tPiN_t=\sum_{i=0}^t P_i.

CIL frameworks can be distinguished by several axes:

2. Key Mechanisms for Mitigating Catastrophic Forgetting

Mitigation of catastrophic forgetting in CIL operates at three principal architectural loci: representation drift, classifier distortion, and memory replay.

a) Knowledge Distillation and Feature Regularization

  • Deep CIL frameworks commonly distill the outputs or features of an old model ("teacher") into the current ("student") using loss terms such as

Ldistill=(x,y)KΦt1j(x)logΦtj(x),\mathcal{L}^{\text{distill}} = -\sum_{(x,y)\in\mathcal{K}} \Phi_{t-1}^j(x) \log \Phi_t^j(x),

sometimes with margin or prototype constraints, to align logits or feature manifolds (Belouadah et al., 2020Kang et al., 2022).

  • Recent advances employ adaptive, channel- or feature-wise importance estimation to consolidate "critical" representations and permit plasticity elsewhere (Kang et al., 2022).

b) Prototype and Class-Mean Mechanisms

  • Fixed or incrementally updated prototype classifiers compute class centroids in the frozen feature space, using nearest-class-mean assignment (NCM), SVMs, or SLDA classifiers, and largely eliminate classifier distortion (Belouadah et al., 2020Liu et al., 2023Huang et al., 2024).
  • Self-supervised or hybrid approaches use prototype clustering and embedding reservation to maintain feature space for future classes (Chen et al., 2023).

c) Replay and Exemplar Selection

3. Representative Frameworks and Methodological Innovations

A selection of recent frameworks and their principal characteristics includes:

Framework Core Mechanism(s) Exemplar Use
iCaRL, BiC, PODNet Cross-entropy + replay + distillation Buffer
AFC (Kang et al., 2022) Feature-map importance for adaptive distillation Buffer
IPC (Liu et al., 2023) Self-supervised fixed encoder + prototype increment None
G2B (Wu et al., 2024) Two-branch, per-block modulation via side-branch CNN Buffer
KCCIOL (Karim et al., 2021) Meta-learning w/ knowledge consolidation & masks None
IR (Huang et al., 2024) Dataset augmentation + L2 space-maintenance loss None
EndoCIL (Liu et al., 20 Oct 2025) MMD replay, prior-balanced loss, gradient calibration Buffer
DCID (Zhang et al., 2022) Decentralized learning + multistage distillation Per-site anchors

Key methodological advances include:

  • Self-supervised learning for CIL: Replacing label prediction with contrastive InfoNCE objectives decouples representation robustness from classifier drift, yielding high anti-forgetting and cross-phase generalization (Ni et al., 2021Chen et al., 2023).
  • Active and frugal CIL: Selective labeling and compressed buffer strategies (Active CIL, CIFNet) drastically reduce annotation cost and memory footprint while sustaining competitive accuracy (Dopico-Castro et al., 14 Sep 2025Bhattacharya et al., 4 Feb 2026).
  • Data-free incremental learning: Synthesis of anchor images using frozen models, followed by combined contrastive, normalized cross-entropy, and margin losses, achieves strong incremental adaptation without privacy-sensitive storage (Ayromlou et al., 2022).
  • Two-branch and modulation architectures: Side-branched convolutional modules can sparsify and stabilize feature propagation, consistently improving baseline CIL performance without introducing specialized losses (Wu et al., 2024).

4. Evaluation Protocols, Datasets, and Metrics

Experimental protocols follow standardized incremental splits of CIFAR-100 (10 classes per 10 increments), ImageNet-100/1000 (various splits), TinyImageNet, Omniglot, and task-specific datasets (e.g., MNIST, CUB200, UCF101 for action recognition) (Belouadah et al., 2020Liu et al., 20 Oct 2025Park et al., 2022). Incremental test accuracy is universally reported as mean top-k accuracy over all previously seen classes after each phase, with "last" and "average" accuracy metrics, and explicit computation of forgetting:

F=1T1t=1T1[maxi=0,...,t1aiat],F = \frac{1}{T-1} \sum_{t=1}^{T-1} \big[ \max_{i=0,...,t-1} a_i - a_t \big],

where ata_t is the accuracy of old classes after learning state tt (Belouadah et al., 2020).

Alternative metrics include:

5. Stability–Plasticity, Limitations, and Open Problems

CIL frameworks balance stability (retaining old knowledge) and plasticity (acquiring new classes). Catastrophic forgetting arises from both feature drift and classifier bias; replay and distillation address this at the representation and output levels, respectively. Asymptotic accuracy remains below joint retraining in most standard regimes, with the residual gap largely due to representation overlap and insufficient allocation of feature space for novel classes (Ni et al., 2021Chen et al., 2023Liu et al., 2023).

Salient limitations and open challenges:

6. Specialized Extensions and Domains

  • Few-shot CIL (FSCIL): Designed for extremely low-shot increments, requiring maximal separation of existing classes and feature-space "pre-allocation" (Nema et al., 16 Jan 2025).
  • Action/video recognition: Incorporates temporal-channel importance and orthogonality-regularized distillation to address temporal redundancy (Park et al., 2022).
  • Active and cost-aware CIL: Reduces real annotation demands via diversity- and uncertainty-based sampling (Bhattacharya et al., 4 Feb 2026).
  • Open-set CIL: Merges open-set recognition with CIL for practical nonstationary environments, using embedding compactness/separability constraints (Xu et al., 2023).
  • Endoscopic and medical imaging: Specialized frameworks (EndoCIL) tackle severe class imbalance and multi-source distribution shifts with domain-consistent replay and loss calibration (Liu et al., 20 Oct 2025).

7. Practical Considerations and Recommendations

The choice of CIL framework depends critically on memory constraints, annotation cost, computational sustainability, and the degree of permissible representation change. For bounded memory and frequent increments, simple bias-calibrated fine-tuning or buffered replay with prototype classifiers perform robustly (Belouadah et al., 2020). When memory is disallowed or privacy is paramount, self-supervised and data-free CIL offer marked forgetting resistance (Huang et al., 2024Ayromlou et al., 2022Liu et al., 2023).

Across regimes, integration of feature augmentation, dynamic loss calibration, self- or meta-supervision, and lightweight classifier expansion constitutes the state of the art in minimizing catastrophic forgetting and enhancing long-range adaptation. Comprehensive benchmarks and open-source codebases facilitate reproducibility and cross-method comparison (Belouadah et al., 2020Zhang et al., 2022). Future advances are likely to focus on scaling to realistic distributed data, optimizing for annotation and energy cost, and formalizing stability–plasticity trade-offs in ever-larger streaming contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Class-Incremental Learning Framework.