Holistic Proxy-based Contrastive Replay
- HPCR is a holistic method integrating proxy-based contrastive mechanisms with multiple augmented rehearsals to address key challenges in online continual learning.
- It systematically interleaves repeated augmentation steps and memory replay to accurately approximate the loss landscape while balancing stability and plasticity.
- By using reinforcement learning for hyperparameter adaptation, HPCR achieves significant performance improvements across various continual learning benchmarks.
Repeated Augmented Rehearsal (RAR) is a rehearsal-based approach to online continual learning (OCL) designed to mitigate the underfitting–overfitting dilemma inherent in memory-constrained experience replay. RAR systematically interleaves multiple rehearsal iterations and online data augmentation at each training step to yield an improved empirical approximation of the loss landscape of previously encountered tasks. It further introduces reinforcement learning (RL)-based hyperparameter adaptation to dynamically balance stability and plasticity during training, achieving substantial improvements over baseline and state-of-the-art replay methods across a variety of benchmarks (Zhang et al., 2022).
1. Formalization of Online Continual Learning and Rehearsal Dynamics
In OCL, a model observes data via a non-stationary stream of mini-batches drawn from distributions , each defining a "task" when the distribution is fixed within contiguous segments. A finite memory buffer of size retains a subset of previously seen examples, with reservoir sampling for content refresh.
At each step , two batches are sampled: the current stream batch and a memory batch . The cumulative empirical loss is minimized with a single pass over incoming data, but both batches are used for training:
This approach, known as experience replay (ER), presents challenges due to the small and dynamic memory’s inability to fully represent past data distributions.
2. Theoretical Perspective: Dynamic Empirical Risk and Memory Overfitting
Analysis of ER in the online setting reveals that memory-based updates perform gradient descent on a biased, dynamic empirical risk:
with
Here, reflects the task-to-memory ratio and tracks the sequence progression; together, they modulate the weighting of memory loss versus streaming data. Notably, the weight generally deviates from 1, introducing risk of overfitting to the buffer. As the stream progresses, increases, but the relative scale remains, implicating persistent bias.
Multiple rehearsals () of the same mini-batch exacerbate regularization issues, quickly driving the loss of to zero while memory losses degrade slowly, often placing solutions on high-loss ridges for past task distributions.
3. Algorithmic Components of RAR
RAR introduces two main components at each step:
- Multiple rehearsal iterations per incoming mini-batch.
- Random data augmentation (e.g., RandAugment) independently applied at each iteration.
The training protocol at time is:
1 2 3 4 5 6 7 8 |
for k in 1 ... K: sample memory batch B^M_{t,k} ⊂ M form joint batch B_{t,k} = B_t ∪ B^M_{t,k} apply random augment g_{t,k} ← sample from augmentation group G perform SGD update on loss: ℓ_RAR = ∑_{(x, y)∈B_{t,k}} ℓ( f_θ( g_{t,k}(x) ), y ) end update memory M with reservoir sampling |
The instantaneous loss at iteration : and aggregated over :
4. Loss-Landscape Approximation and Ridge Aversion
By averaging memory loss across random, label-preserving transformations, RAR operationalizes training on the augmented empirical risk:
This averaging, as shown (Proposition 3), reduces the variance and generalization error relative to static buffer rehearsal. Empirical visualization of the loss landscape reveals that RAR-aligned test-loss and memory-loss contours coincide; in particular, RAR solutions lie off the high-loss ridges that typify ER-based continual learning, indicating effective "ridge aversion."
5. Reinforcement Learning for Hyperparameter Adaptation
RAR incorporates two principal hyperparameters: the rehearsal repetition count and augmentation strength parameters . Automated selection is formulated as an online RL (multi-armed bandit) problem using bootstrapped policy gradient (BPG):
- State: stateless, with reset on each new task.
- Actions: discrete configurations of .
- Reward: absolute deviation of memory accuracy from a target ,
Actions leading to accurate but not overfitted buffer performance are reinforced. The policy partitions actions into "better" and "worse" sets, and the signed policy-gradient formula iteratively adapts the hyperparameters, converging rapidly in practice.
6. Empirical Evaluation
RAR was benchmarked on Seq-CIFAR100, Seq-Mini-ImageNet, CORE50-NC, and CLRS25-NC using fixed buffer sizes of 2K or 5K samples. Baseline ER reports end-of-sequence accuracies of approximately 19-24% (2K memory), while RAR (with fixed ) achieves 28-39%—an absolute gain of 9–15%. When appended to advanced replay strategies MIR, ASER, SCR, RAR further boosts performance by 7–18%. Reweighted ER and distillation-based DER also show improvements up to 33%.
Ablation studies indicate that "repeats-only" or "augmented-only" variants offer inconsistent results contingent on the task-to-memory ratio ; only their combination in RAR is robustly advantageous across all datasets. Stability-plasticity analysis reveals that increasing enhances plasticity while reducing stability, with augmentation compensating for the trade-off and shifting the Pareto frontier outward.
RL-based adaptation of hyperparameters (Table 3) outperforms online continual learning–hyperparameter tuning (OCL-HT) by 4–6 points during initial tasks and adapts smoothly to heightened overfitting risk as increases.
7. Practical Guidance and Recommendations
The task-to-memory ratio informs hyperparameter selection:
- Large (few memories per task): stronger augmentation , smaller .
- Small : moderate augmentation, larger .
Empirical recommendations for most vision continual learning applications are , or $2$, , with RL-based hyperparameter tuning providing an effective alternative to manual grid search at modest computational cost (approximately training time). Target memory accuracy has proven effective for RL-based adaptation.
RAR functions as a modular augmentation for any rehearsal scheme, offering a principled remedy for the underfitting/overfitting trade-off and delivering consistent accuracy gains of 9–20% across replay algorithms and benchmarks (Zhang et al., 2022).