Continual Learning Scenarios

Updated 31 December 2025

Continual Learning Scenarios are formal structures that define protocols for sequential, non-stationary learning tasks in machine learning.
They distinguish between task-incremental, domain-incremental, and class-incremental settings to address unique challenges like catastrophic forgetting.
Researchers use methods such as replay buffers, regularization, and dynamic architectures to manage shifting data distributions and preserve learned knowledge.

Continual learning scenarios constitute the formal structures and protocols under which machine learning models are exposed to sequential, non-stationary distributions of tasks or data streams. The evolution of continual learning as a research field centers on defining these scenarios with precise assumptions about label spaces, input distributions, knowledge transfer constraints, and available side-information at both training and inference. Canonical distinctions include task-incremental, domain-incremental, class-incremental, modality- and repetition-aware variants, and graph-specific extensions. Rigorous benchmarking and evaluation hinge on understanding the implications and computational difficulty of each scenario, as well as the failure modes that arise (e.g., catastrophic forgetting, interference, memory constraints).

1. Formal Taxonomy of Continual Learning Scenarios

Three principal scenarios persist across recent literature: Task-Incremental Learning (Task-IL), Domain-Incremental Learning (Domain-IL), and Class-Incremental Learning (Class-IL) (Ven et al., 2019, Aljundi, 2019, Ven et al., 2024, Hsu et al., 2018).

Task-Incremental (Task-IL): A sequence of tasks $\mathcal{T} = \{T_1, ..., T_T\}$ , each defined by its own input distribution $D_t$ and (possibly disjoint) label-set $Y_t$ . At inference, the model receives the task identity, enabling the use of conditional mappings $f(x, t;\theta)$ , often realized as multi-headed architectures. Catastrophic forgetting is minimized via explicit parameter isolation, regularization, or dynamic architectures, with resource efficiency and transfer as key secondary objectives.

Domain-Incremental (Domain-IL): Tasks share a fixed label-set $Y$ but their input distributions $P_t(x|y)$ shift. Task identity is not supplied at test; the model must generalize across domains without explicit separation. Forgetting and adaptation difficulty arise from covariate shift. Rehearsal, domain-adversarial adaptation, and batch-norm strategies are often employed.

Class-Incremental (Class-IL): Tasks introduce new, disjoint sets of classes $Y_t$ , growing the total output space. No task identity is provided at test time; the model must choose among all seen labels. This scenario produces severe forgetting, especially as decision boundaries for old classes are disturbed by new classes. Replay-based methods are indispensable; regularization alone is ineffective (Ven et al., 2019).

Repetition-Enabled Variants (CI-R): Recent work argues that scenarios where classes reappear in later tasks—mirroring vision and robotics realities—better reflect practical deployments, with repetition naturally mitigating forgetting and enabling forward transfer (Cossu et al., 2021).

Multi-Modality Scenarios: For streams where modalities (image, audio, video, text, etc.) vary across tasks, scenario definitions hinge on whether modality identity is known at inference and whether modalities arrive in blocks, interleaved, or simultaneously. Challenges include cross-modal interference, embedding drift, and routing uncertainty (Jin et al., 11 Mar 2025).

Graph-Specific Extensions: Node, link, and graph classification tasks admit domain-, class-, task-, and time-incremental splits, with dynamic graph structures and evolving label spaces presenting further complexity (Ko et al., 2022).

2. Mathematical Formulation and Scenario-Specific Objectives

Each scenario entails a distinct problem formulation:

Scenario	Objective	Constraints/Identifiers
Task-IL	$\min_\theta \sum_t \mathbb{E}_{(x,y)\sim D_t} [\ell(f(x, t;\theta), y)]$	Task ID at test
Domain-IL	$\min_\theta \sum_t \mathbb{E}_{(x,y)\sim D_t} [\ell(f(x;\theta), y)]$	Fixed label set; unknown domain
Class-IL	$\min_\theta \sum_t \mathbb{E}_{(x,y)\sim D_t} [\ell(f(x;\theta), y)],\ y\in\cup_{s=1}^t Y_s$	Disjoint, growing label set

Regularization-imposed objectives append stability terms enforcing proximity to prior optima, e.g., $\ell_\text{total}(\theta) = \ell_\text{current}(\theta) + \lambda \|\theta - \theta^*\|^2_\Sigma$ (Ven et al., 2024). Rehearsal objectives incorporate replay buffers: $\ell_\text{total}(\theta) = \mathbb{E}_{(x,y)\sim D_t}[\ell(f(x;\theta),y)] + \mathbb{E}_{(x',y')\sim M}[\ell(f(x';\theta),y')]$ .

Multi-modal scenarios require joint objectives summing task-specific, intra-modality regularization, cross-modal aggregation, and embedding re-alignment losses (Jin et al., 11 Mar 2025):

$\mathcal{L}^t = \mathcal{L}_{\mathrm{task}}^t + \alpha \sum_{m\in\mathcal{M}^t} \mathcal{L}_{\mathrm{SR}}^t(m) + \beta \mathcal{L}_{\mathrm{CA}}^t + \gamma \mathcal{L}_{\mathrm{RA}}$

3. Implementation Protocols and Evaluation Metrics

Scenario instantiation dictates dataset partitioning, label mappings, and presence/absence of auxiliary information. Implementation frameworks such as Continuum and BeGin provide reproducible data loaders for class-, domain-, task-, and time-incremental splits (Douillard et al., 2021, Ko et al., 2022).

Standard Metrics:

Average Incremental Accuracy (AIA): Mean accuracy across all tasks post-training each increment.
Final Average Accuracy (FAA): Multiclass accuracy after the final task.
Forgetting ( $F$ ): Average drop in per-task accuracy from when first learned to after final training.
Backward Transfer (BWT): Measures effect of new tasks on performance of old tasks, $BWT = (1/(T-1)) \sum_{k=1}^{T-1} (A_{T,k} - A_{k,k})$ (Ven et al., 2024).
Forward Transfer (FWT): Improvement on future tasks from past training.

Replay buffer size, regularization strength ( $\lambda$ ), and optimizer choice are scenario-sensitive. For long-tail task-size sequences, optimizer-state accumulation (e.g., continual AdamW) stabilizes retention and forward transfer (Kang et al., 2024).

4. Representative Methods and Scenario Suitability

Task-IL scenarios accommodate regularization-based strategies (EWC, SI, MAS), dynamic architectures (Progressive Networks), and gating/expert mechanisms, exploiting known task boundaries (Aljundi, 2019, Ko et al., 2022).

Domain-IL scenarios favor replay, domain-adaptive normalization, and self-supervised auxiliary tasks; regularization provides modest retention in low-drift domains but suffers under large cumulative shifts (Armstrong et al., 2021).

Class-IL scenarios require replay mechanisms—either via raw exemplars (iCaRL, memory-aware rehearsal, GDumb), generated examples (DGR), or prototype-based representations—to prevent catastrophic collapse of old class decision boundaries (Ven et al., 2019). Regularization alone is ineffective in single-head label growth settings (Ven et al., 2019, Jin et al., 11 Mar 2025).

Multi-modal continual learning introduces intra-modality regularization, cross-modal aggregation via weighted alignment of embeddings, and embedding re-alignment heads to counteract text encoder dominance and embedding drift (Jin et al., 11 Mar 2025). Ablations confirm criticality of all three components for high accuracy and low forgetting.

For domains with repeated class appearances (CI-R, NIC), empirically measured forgetting is substantially reduced and forward transfer is enhanced over pure class-incremental protocols (Cossu et al., 2021, Douillard et al., 2021).

5. Empirical Scenario Difficulty and Best Practices

Empirical studies reveal relative scenario difficulty:

Scenario	Typical Avg. Accuracy (Split-MNIST, No Replay)	Key Failure Modes
Task-IL	97–98%	Mild forgetting; resource overhead
Domain-IL	55–65%	Moderate forgetting; covariate shift
Class-IL	19–22%	Severe catastrophic forgetting

Replay-buffer methods perform near joint upper bounds if buffer budget suffices; generative replay scales better for high-class-count regimes (Hsu et al., 2018, Ven et al., 2019, Bhatt et al., 2024).

Best practices mandate explicit scenario declaration, examination of forward/backward transfer, memory-compute trade-offs, robustness to task order, and repetition-aware benchmarks for real-world fidelity (Cossu et al., 2021, Hsu et al., 2018, Douillard et al., 2021). Hyperparameters (especially learning rate and replay buffer size) show disproportionate importance for stream accuracy and forgetting mitigation; functional ANOVA enables adaptive subspace selection for efficient HPO in CL streams (Semola et al., 2024).

6. Extensions: Modality, Concept Drift, Privacy, and Graphs

Continual learning for multi-modality streams introduces unique interference—overwriting and embedding bias—requiring regularized modularity and explicit cross-modal aggregation (Jin et al., 11 Mar 2025).

Under concept drift, hybrid replay mechanisms using centroid-driven bufffers with reactive subspace tracking combine retention of valid concepts and adaptive forgetting of outdated ones (Korycki et al., 2021).

Privacy-sensitive and model-centric settings, as in Ex-Model CL, shift learning from raw data to streams of pre-trained experts, optimizing knowledge distillation over surrogate buffers. These settings emphasize architecture interface constraints and surrogate quality bottlenecks (Carta et al., 2021).

Graph continual learning generalizes all scenarios to node, link, and graph classification with additional graph dynamics (time-incremental) and dependency-aware partitioning. Modular frameworks (e.g., BeGin) facilitate reproducibility and correctness for large-scale benchmarks (Ko et al., 2022).

7. Scenario Selection, Real-World Applications, and Future Directions

Proper scenario selection—informed by input drift, label evolution, side-information availability, and modality/repetition structure—governs empirical evaluation and method design. Autonomous detection, lifelong vision, healthcare analytics, and audio analysis each instantiate protocol-specific challenges, e.g., multi-modal interference, longitudinal domain shifts, and dynamic graph topology (Nasir et al., 15 Jul 2025, Armstrong et al., 2021, Bhatt et al., 2024).

Future research directions include large-scale repetition-enabled streams, adaptive hyperparameter tuning, buffer-efficient replay, scalable generative replay for high-task regimes, and application of scenario-specific protocols to streaming modalities, graph data, and federated settings.

The rigorous separation of scenario assumptions remains critical for fair benchmarking, understanding of catastrophic forgetting, and correct interpretation of empirical results. The unified taxonomy provides both practical guidance for method design and a foundation for deeper theoretical analysis of stability–plasticity trade-offs under continual non-stationary learning (Ven et al., 2024).