Retrospective Augmented Rehearsal (RAR)
- RAR is an online continual learning method that integrates repeated data exposure with stochastic augmentations to counter both underfitting and memory overfitting.
- It leverages a biased empirical risk formulation and RL-based hyperparameter tuning to optimize performance across benchmarks such as CIFAR100 and MiniImageNet.
- RAR aligns memory and test loss landscapes, resulting in robust generalization and significant accuracy gains over traditional experience replay methods.
Retrospective Augmented Rehearsal (RAR) is a method for online continual learning (OCL) that addresses the dual challenge of underfitting new data and overfitting limited episodic memory. RAR generalizes rehearsal-based learning by combining repeated exposures to new data with stochastic data augmentations applied to both current and memory samples. This approach yields substantially improved risk minimization and empirical accuracy on a wide range of OCL benchmarks, outperforming both vanilla experience replay (ER) and several state-of-the-art variants. RAR provides not only practical gains but also novel theoretical insight into the memory bias and loss landscape approximation properties of rehearsal in OCL (Zhang et al., 2022).
1. Online Continual Learning Problem Formulation
In OCL, a learner observes a non-stationary stream of mini-batches , where each is sampled from a current-task distribution . A fixed-size episodic memory , containing up to past examples selected online (typically using reservoir sampling), provides limited storage for rehearsal. At each time , a memory batch is sampled, and the model is updated via gradient descent on both current and memory samples.
The continual learning risk is formalized as: where is the model and is the per-sample loss (e.g., cross-entropy). The vanilla ER update at each step is: Vanilla ER is known for strong empirical results but remains susceptible to biased risk and catastrophic forgetting due to limitations in memory representativity and overfitting.
2. Biased Empirical Risk Minimization and Memory Overfitting
If memory is updated via reservoir sampling, vanilla ER optimizes a biased empirical risk: where is the task-to-memory size ratio, and with and denoting counts of current and past samples. Key phenomena include:
- Bias & Overfitting: upweights memory, increasing overfitting risk for large .
- Underfitting: Insufficient exposure to new tasks results from a single stream pass.
- Temporal Dynamics: as history grows, so bias persists.
While repeated rehearsal (multiple inner updates with ) can mitigate underfitting, it accelerates memory overfitting, as the contribution from the incoming batch rapidly diminishes with successive iterations.
3. Retrospective Augmented Rehearsal: Algorithm and Objective
RAR integrates repeated rehearsal with stochastic augmentation. Let denote a compact group of transformations (e.g., RandAugment), with a sampled augmentation. At each step , iteration :
- Sample from memory.
- Draw .
- Form an augmented batch:
- Apply SGD update:
The total RAR loss per sample is: RAR thus implements SGD on an augmented biased risk objective with integrated augmentation and memory weighting:
Empirical ablations confirm that both repetition () and augmentation are needed for consistent gains: using only one of these is often detrimental.
4. Loss Landscape and Ridge Aversion
It is established that ER solutions can align with high-loss ridges in the past-task loss landscape, resulting in memory overfitting even as memory-loss appears low. RAR aligns the memory-based and test-based loss contours, such that the optimization endpoint lies in a low-loss valley for both criteria. A quantitative metric for this effect is
RAR achieves significantly lower , demonstrating superior ridge aversion and improved correspondence between memory and true risk. This property is not observed for pure repetition or pure augmentation in isolation.
5. RL-Based Hyperparameter Auto-Tuning
RAR introduces critical hyperparameters:
- : number of rehearsal repeats,
- : augmentation strength (RandAugment selects operators at magnitude ).
Given the challenge of hyperparameter selection in absence of external validation, a multi-armed bandit with bootstrapped policy gradient (BPG) is employed. The key features of this scheme are:
- State: None; bandit is fully observable.
- Action: , sampled from policy .
- Reward:
where is the memory-batch training accuracy, and is a target.
- Policy Update: Updates via BPG based on sets of "better"/"worse" actions, using
The bandit converges within a few tasks, adaptively balancing stability-plasticity and improving robustness.
6. Empirical Performance on OCL Benchmarks
RAR achieves substantial performance gains on four OCL benchmarks: Seq-CIFAR100, Seq-MiniImageNet, CORE50-NC, and CLRS25-NC, with memory budgets . The following table summarizes improvement in average accuracy:
| Benchmark | ER Baseline | ER-RAR | Gain |
|---|---|---|---|
| CIFAR100, k | 19.0 | 27.8 | +8.8 |
| MiniImageNet, k | 20.0 | 30.0 | +10.0 |
| CORE50, k | 24.0 | 39.3 | +15.3 |
| CLRS25, k | 18.7 | 28.6 | +9.9 |
RAR also yields +5–18 point improvements when used to augment state-of-the-art variants MIR, ASER, and SCR. Notably, ablation studies show (i) only repetition or only augmentation often reduce performance; only their combination secures robust gains, and increasing beyond about 10 yields diminishing, but not negative, returns.
7. Hyperparameter Configuration and Practical Guidelines
Best practice suggestions include:
- Default: , RandAugment .
- For high task-to-memory ratio (; overfitting risk): stronger augmentation, fewer repeats.
- For low (underfitting risk): more repeats, lighter augmentation.
- The RL-based adaptation typically doubles runtime, but provides robustness and requires only a single additional pass.
RAR represents a minimal, generalizable extension to rehearsal-based OCL that simultaneously counters underfitting (by repeated rehearsal) and memory overfitting (by augmentation), yielding accurate empirical risk and alignment with the true loss surface (Zhang et al., 2022).