Prototype-based Continual Learning

Updated 29 January 2026

Prototype-based continual learning is a paradigm that uses compact feature vectors (prototypes) to represent classes and mitigate issues like catastrophic forgetting.
It employs methods such as clustering, semantic alignment, and learnable drift compensation to update prototypes efficiently under evolving data distributions.
Integrating with meta-learning and federated frameworks, it delivers memory efficiency and interpretable decision-making across varied continual learning applications.

Prototype-based continual learning is a paradigm in which models represent knowledge about classes, tasks, or domains using summary feature vectors (“prototypes”) that encode the central tendency or structural pattern associated with each entity. These prototypes are leveraged for memory-efficient rehearsal, task discrimination, drift compensation, interpretable decision-making, and fast adaptation to new data distributions. The approach addresses core continual learning challenges including catastrophic forgetting, task-recency bias, concept drift, and class confusion under resource constraints and in heterogeneous application domains.

1. Mathematical Formulation and Prototype Construction

Prototype-based continual learning abstracts classes (or categories) with compact representations computed from feature embeddings. The canonical prototype for class $l$ at episode $i$ is defined as the mean of encoder outputs over a support set $S_{i,l}$ ,

$\mathbf{c}_l = \frac{1}{|S_{i,l}|}\sum_{(x,y)\in S_{i,l}} h_{\phi_{\mathrm{proto}}}(x)$

where $h_{\phi_{\mathrm{proto}}}$ is a trained feature extractor or embedding head. Variants extend this to part-based prototypes for interpretable regions (Rymarczyk et al., 2023), label-free cluster means (Aghasanli et al., 9 Apr 2025), variational Gaussian prototypes to model intra-class uncertainty (Zhang et al., 2019), and learnable prototype structures for multi-level task or node abstraction in graphs (Zhang et al., 2021).

Prototype construction can employ clustering (e.g., K-means or band-based selection (Aghasanli et al., 9 Apr 2025), loss-trajectory clustering (Rahmani et al., 13 Jan 2025)), semantic alignment strategies (e.g., initial grounding in new task samples (Fuente et al., 12 May 2025)), or specialized initialization schemes for fine-grained interpretability (Rymarczyk et al., 2023). Storage budgets are typically bounded per class or cluster, enabling tight memory control compared to exemplar replay— $|\mathcal{M}| \leq nC$ for $n$ prototypes per $C$ classes (Ho et al., 2021).

2. Prototype Replay, Update, and Drift Compensation

Representing history via prototypes enables efficient rehearsal and stabilization of previously learned knowledge. Dynamic replay strategies select the most representative or challenging samples by ranking candidate embeddings via Euclidean or angular distance to their class prototypes (Ho et al., 2021, Rahmani et al., 13 Jan 2025). Prototype-guided buffers or rehearsal memories encode only summary statistics and can be updated online, at episode boundaries, or asynchronously in federated setups (Shenaj et al., 2023).

However, prototypes suffer from semantic drift: as the feature extractor evolves, previously computed prototypes become misaligned with the new feature space, leading to degraded classification and catastrophic forgetting (Gomez-Villa et al., 2024). Learnable Drift Compensation (LDC) rectifies this by training a forward projector $p_F^t$ mapping old prototypes into the current backbone's embedding space,

$P_{t}^c = p_F^t(P_{t-1}^c),\quad \mathcal{L}_{\mathrm{LDC}} = \| p_F^t(f_{\theta}^{t-1}(x)) - f_{\theta}^{t}(x) \|_2^2$

This semantic alignment recovers most of the accuracy lost to drift, even without exemplars, and generalizes to both supervised and self-supervised continual learning (Gomez-Villa et al., 2024).

3. Meta-Learning, Replay and Optimization Frameworks

Prototype-based schemes integrate seamlessly into meta-learning and replay-augmented optimization. For example, PMR frames continual learning as a nested meta-learner, using episodic prototype computation, selective memory updating, and meta-optimization of representation and prediction subnetworks (Ho et al., 2021). Prototypes are recomputed from new mini-supports every episode to track distributional shifts, and memory replay is orchestrated at fixed frequency or adaptively tied to confusion metrics (Wei et al., 2023).

Contrasts between global mean prototypes (constant-momentum updates (Lange et al., 2020)) and batch-wise online prototypes (OnPro equivalence (Wei et al., 2023)) illustrate the trade-off between stability and recency. Complex schemes enforce prototype equilibrium by contrastive losses that pull new data toward prototypes and push inter-class boundaries (Wei et al., 2023), or employ adaptive sampling and mixup along close class pairs to resolve misclassification (Wei et al., 2023).

Gradient imbalance, an artifact of prototype replay in online continual learning, is addressed by per-class hypergradient scale factors, learned via meta-optimization of gradient inner products (Michel et al., 26 Feb 2025). This Class-Wise Hypergradient (CWH) mechanism rebalance plasticity and stability across class heads during non-stationary training.

4. Interpretability, Self-Explainability, and Hierarchical Prototypes

Part-based prototype networks deliver instance-level and global interpretability by assigning semantic meaning to prototype activations (Rymarczyk et al., 2023, Valerio et al., 8 Dec 2025). Each prototype is associated with a class or part and explanations are generated in two modes: globally (via the weights connecting prototypes to classes) and locally (via the spatial pattern of activated prototypes in a feature map). To prevent interpretability drift, ICICLE introduces a similarity distillation loss that penalizes changes in prototype activation regions for top-activated patches across tasks,

$L_{IR} = \sum_{i,j} |\mathrm{sim}(p^{\,t-1},z_{i,j}^t) - \mathrm{sim}(p^{\,t},z_{i,j}^t)| S_{i,j}$

Bias compensation and head decorrelation further rebalance recency bias and promote orthogonality across tasks (Rymarczyk et al., 2023, Valerio et al., 8 Dec 2025). CIP-Net demonstrates the feasibility of a fixed shared prototype pool with task-specific frozen heads, enabling state-of-the-art exemplar-free continual learning with integrated local and global explanation capability, while using a constant-size architecture (Valerio et al., 8 Dec 2025).

5. Applications and Extensions: Federated, Open-World, Generative, and Graph-based CL

Prototype-based approaches generalize across diverse continual learning settings:

Federated asynchronous CL: Local and global prototypes anchor distributed clients, with server-side aggregation balancing asynchronous class presentation and preventing drift (Shenaj et al., 2023).
Open-domain CL: Category-aware intra-domain prototypes support training-free Task-ID discrimination and domain-aware prompt injection for robust domain separation and zero-shot preservation in VLMs (Lu et al., 2024).
Few-shot online CL for robotics: Metaplastic prototypes, each with adaptive learning rate tied to historical “goodness,” drive efficient base retention, sharp novelty detection, and unsupervised learning of novel concepts at ultra-low memory cost (Hajizada et al., 2024).
Generative replay: Class-conditional prototypes guide diffusion models to synthesize high-fidelity old task data, essential for mitigating generator drift in generative replay pipelines (Doan et al., 2023).
Continual graph learning: Hierarchical Prototype Networks construct atomic, node, and class-level prototypes via adaptive feature extractor pools, ensuring both memory-boundedness and lossless continual expansion of graph categories (Zhang et al., 2021).

6. Comparative Benchmarks, Memory Trade-offs, and Ablation Studies

Prototype-based methods consistently outperform strong replay, regularization, and prompt-based baselines in both class- and task-incremental settings (SplitCIFAR100, TinyImageNet, CUB-200, CORe50, OpenLORIS, among others) (Ho et al., 2021, Aghasanli et al., 9 Apr 2025, Valerio et al., 8 Dec 2025, Rymarczyk et al., 2023, Wei et al., 2023, Zhang et al., 2021). Key findings include:

Order-of-magnitude reduction in memory footprint: PMR achieves superior accuracy at $<0.1\%$ replay rate (Ho et al., 2021); label-free buffer methods store only $O(K)$ prototypes (Aghasanli et al., 9 Apr 2025).
Exemplar-free models (PRD, CIP-Net, PAH) close the gap to replay-based and joint training or outperform them, with prototype distillation, interpretable regularization, and self-explainable decisions (Valerio et al., 8 Dec 2025, Asadi et al., 2023, Fuente et al., 12 May 2025).
In federated, asynchronous, and open-domain settings, prototype-based frameworks demonstrate robustness to task order, client heterogeneity, and domain confusion (Lu et al., 2024, Shenaj et al., 2023).
Drift compensation and hypergradient reweighting restore much of the loss from representation drift or bias and merge online and offline CL performance regimes (Gomez-Villa et al., 2024, Michel et al., 26 Feb 2025).
Ablation studies indicate that cluster-preservation losses, distillation penalties, semantic initialization, and adaptive selection are essential for sustainable accuracy and backward transfer (Aghasanli et al., 9 Apr 2025, Rahmani et al., 13 Jan 2025, Fuente et al., 12 May 2025, Doan et al., 2023).

7. Current Challenges and Prospects

Despite the demonstrated effectiveness, prototype-based continual learning confronts several open issues:

Semantic drift and prototype misalignment remain a fundamental challenge in moving-backbone regimes, partially addressed by projectors but susceptible to data coverage bias (Gomez-Villa et al., 2024).
Memory scaling with the number of classes and tasks, especially in prompt-prototype frameworks, necessitates dynamic budgeting and pruning (Luo et al., 8 Jan 2026).
Interpretability mechanisms, while advanced in part-based networks, may require tighter integration with natural concepts or human-in-the-loop feedback for robust semantic grounding (Valerio et al., 8 Dec 2025).
Most methods employ per-class prototypes; intra-class diversity and multi-modal distributions can benefit from multiple or hierarchical prototype assignment, though architectural and computational constraints must be managed (Doan et al., 2023, Zhang et al., 2021).
Extensions to open-set, unsupervised or regression settings may demand further innovation in prototype distillation, selection, and domain adaptation.

Prototype-based continual learning systems constitute a flexible and theoretically grounded solution to catastrophic forgetting and task adaptation. By leveraging low-dimensional abstractions, memory-efficient replay, and interpretable features, they facilitate robust scaling across diverse CL scenarios including online, federated, open-domain, and domain-incremental learning. Empirical and theoretical advancements continue to refine prototype selection, update, drift correction, and interpretability, making this an active and impactful area of research.