Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fast-Slow Training: Curriculum in Continual Learning

Updated 16 May 2026
  • Fast-Slow Training is a curriculum-based approach in continual learning that orders tasks from easy-to-hard and hard-to-easy to evaluate model performance.
  • It employs defined benchmarks (M2I and I2M) and metrics (ACC, BWT, FWT) to reveal challenges like catastrophic forgetting in modern CL methods.
  • The protocol motivates the development of algorithms that leverage curriculum structure for improved retention and forward/backward transfer in heterogeneous task streams.

Fast-Slow Training (FST) is a curriculum-based protocol for continual learning (CL) that systematically varies the complexity and quality of tasks in a sequential manner. The approach is formalized as two explicit “curricula”—one with increasing task complexity (“fast-to-slow,” or “easy-to-hard”), and its exact reverse (“slow-to-fast,” or “hard-to-easy”)—to investigate whether existing continual learning models can exploit curriculum ordering to improve stability, forward, and backward transfer across highly heterogeneous tasks. FST, as defined in “From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning” (Faber et al., 2023), provides a reproducible and rigorous experimental ground for curriculum evaluation in CL. It reveals fundamental challenges for modern continual learners—most notably incomplete exploitation of curriculum structure and persistent catastrophic forgetting under realistic, multi-domain benchmarks.

1. Motivation and Rationale

Continual learning research has predominantly focused on evaluating catastrophic forgetting and knowledge transfer in artificially constructed benchmarks with abrupt and homogeneous task boundaries. FST addresses the limitations of these settings by:

  • Introducing a curriculum that sequences tasks according to visual complexity and domain difficulty, echoing classic principles of curriculum learning (easy tasks before hard tasks).
  • Systematically comparing forward and backward transfer, as well as forgetting, under both “fast-to-slow” (easy-to-hard) and “slow-to-fast” (hard-to-easy) curricula.
  • Testing whether CL algorithms, especially popular replay- and regularization-based approaches, can capitalize on the additional structure brought by curriculum ordering when facing realistic, heterogeneous task sequences.

This protocol provides new insight into the ability (or lack thereof) of CL models to exploit inherent structure in task ordering, a critical but under-explored axis in practical, non-synthetic continual learning.

2. Benchmark Construction and Curriculum Formalization

FST defines two curriculum-based benchmarks:

  • M2I (“MNIST→TinyImageNet”): Tasks are ordered by increasing complexity:

    1. MNIST (handwritten digits; B&W, low-res)
    2. Omniglot (handwritten letters; B&W)
    3. Fashion-MNIST (clothing items; B&W)
    4. SVHN (street numbers; RGB)
    5. CIFAR-10 (objects; RGB)
    6. TinyImageNet (natural images; higher-res, RGB)
  • I2M (“TinyImageNet→MNIST”): The reverse order, progressing from most complex to least.

All datasets are preprocessed to 64×64 px, and B&W images are replicated to match RGB format. Each task consists of 10 balanced classes, with 500 train and 500 test images per class. The progression in M2I reflects an increase in visual and domain complexity, while I2M mirrors a decreasing curriculum.

FST does not use a formal complexity scoring function s(T)s(T); the ordering is based on domain expertise regarding visual difficulty (e.g., B&W digits < letters < simple objects < natural images).

3. Evaluation Protocols and Metrics

Evaluation in FST separates two key test scenarios:

  • Class-Incremental (CI): No task identity at test time; the model classifies among all classes observed so far.
  • Task-Incremental (TI): Task identity is given at test time; classification is limited to classes from the active task.

Metrics are formalized as:

  • Average Accuracy (ACC):

ACC=1Nj=1NRN,j\mathrm{ACC} = \frac{1}{N} \sum_{j=1}^{N} R_{N, j}

where Ri,jR_{i, j} is the test accuracy on task jj after training on task ii.

  • Backward Transfer (BWT):

BWT=1N1i=1N1(RN,iRi,i)\mathrm{BWT} = \frac{1}{N-1} \sum_{i=1}^{N-1} (R_{N, i} - R_{i, i})

Negative BWT indicates forgetting.

  • Forward Transfer (FWT):

FWT=1N1i=1N1Ri,i+1\mathrm{FWT} = \frac{1}{N-1} \sum_{i=1}^{N-1} R_{i, i+1}

Only defined in the TI scenario.

Two network backbones—Wide-VGG9 and EfficientNet-B1—are used. Training settings are standardized (SGD, lr=0.001, momentum=0.9, 50 epochs per task), and memory-replay methods store 200 exemplars (≈20 per task).

CL strategies are drawn from three major families:

  • Regularization-based: EWC, SI, MAS, LwF
  • Replay-based: naive Replay, GEM, AGEM, GDumb
  • Architecture-based: CWR⋆
  • Baselines: Naive fine-tuning (no anti-forgetting) and Cumulative upper bound (retraining from scratch on all data so far).

4. Empirical Findings and Quantitative Outcomes

The table below summarizes key empirical results for several CL methods under M2I and I2M, using the VGG9 backbone (ACC/BWT reported—higher is better):

Scenario Order Method ACC BWT FWT (TI only)
CI M2I Cumulative 0.868 +0.004 -
CI M2I Replay 0.755 −0.038 -
CI M2I GEM 0.572 −0.074 -
CI M2I Naive FT 0.213 −0.261 -
TI M2I Cumulative 0.819 +0.012 0.102
TI M2I Replay 0.730 −0.086 0.101
CI I2M Cumulative 0.735 +0.012 -
CI I2M Replay 0.550 −0.047 -
TI I2M Cumulative 0.663 +0.018 0.120
TI I2M Replay 0.571 −0.061 0.135

Key observations:

  • All standard CL methods except “Cumulative” incur severe forgetting (BWT ≪ 0), especially in CI.
  • Replay-based approaches are the most robust, but still underperform the cumulative oracle by a wide margin.
  • Regularization and naive fine-tuning collapse entirely on later tasks (ACC ≈ 0.1–0.3).
  • Forward Transfer remains modest (FWT ≈ 0.10–0.13), indicating limited generalization to future tasks.
  • Direct curriculum (M2I) consistently outperforms reverse (I2M), but no strategy fully leverages task ordering for strong positive transfer.
  • EfficientNet-B1 yields higher absolute ACC but the same qualitative hierarchy: Cumulative > Replay > GEM.

5. Analysis of Curriculum Effects and Ordering Sensitivity

FST demonstrates that easy-to-hard (M2I) curricula are systematically advantageous over hard-to-easy (I2M) for all major methods and metrics. For example, Replay in the CI setting achieves ACC = 0.755 under M2I but drops to 0.550 for I2M. Similar performance reversals occur for all strategies.

Despite this, detailed heatmap analyses of Ri,jR_{i, j} show that popular CL methods remain largely unable to exploit the curriculum for significant backward or forward transfer improvements:

  • Replay and GEM achieve the least negative BWT (i.e., minimal forgetting), even with stringent memory constraints.
  • CWR⋆ only retains knowledge of the initial (Task 1) dataset, failing to adapt to subsequent tasks.
  • Regularization methods are universally ineffective under challenging, heterogeneous multi-domain benchmarks as instantiated by FST.

The results suggest that curriculum structure remains an unrealized opportunity for existing CL algorithms.

6. Implications and Directions for Continual Learning Research

FST establishes an experimental standard for evaluating the role of curriculum ordering in continual learning under real-world, highly heterogeneous streams. The findings motivate new directions:

  • Algorithmic innovation: Development of strategies that explicitly track and leverage task complexity, ordering, and transferability.
  • Benchmark advancement: Adoption of multi-domain, curriculum-based protocols that better reflect real deployment non-stationarities.
  • Robustness focus: Emphasis on designing CL algorithms that reduce catastrophic forgetting and exhibit positive transfer over structured, curriculum-ordered streams.

Future work should address the mechanisms by which curricula can be transformed from a passive ordering variable into an active driver of improved retention, generalization, and sample efficiency in CL.

7. Concluding Remarks

Fast-Slow Training provides a rigorous, reproducible protocol for probing the effects of curriculum structure on the stability, plasticity, and transfer properties of continual learning algorithms. Empirical evidence from FST specifies the limitations of current methodologies and delineates essential desiderata for progress in real-world CL scenarios (Faber et al., 2023). Researchers are encouraged to apply curriculum-driven benchmarks such as FST to reveal the robustness and generalization boundaries of emerging continual learning solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fast-Slow Training (FST).