Papers
Topics
Authors
Recent
Search
2000 character limit reached

Holistic Proxy-based Contrastive Replay

Updated 5 February 2026
  • HPCR is a holistic method integrating proxy-based contrastive mechanisms with multiple augmented rehearsals to address key challenges in online continual learning.
  • It systematically interleaves repeated augmentation steps and memory replay to accurately approximate the loss landscape while balancing stability and plasticity.
  • By using reinforcement learning for hyperparameter adaptation, HPCR achieves significant performance improvements across various continual learning benchmarks.

Repeated Augmented Rehearsal (RAR) is a rehearsal-based approach to online continual learning (OCL) designed to mitigate the underfitting–overfitting dilemma inherent in memory-constrained experience replay. RAR systematically interleaves multiple rehearsal iterations and online data augmentation at each training step to yield an improved empirical approximation of the loss landscape of previously encountered tasks. It further introduces reinforcement learning (RL)-based hyperparameter adaptation to dynamically balance stability and plasticity during training, achieving substantial improvements over baseline and state-of-the-art replay methods across a variety of benchmarks (Zhang et al., 2022).

1. Formalization of Online Continual Learning and Rehearsal Dynamics

In OCL, a model fθ:XRCf_\theta: \mathcal X \to \mathbb R^C observes data via a non-stationary stream of mini-batches Bt={(xi,yi)}i=1B\mathcal B_t = \{(x_i, y_i)\}_{i=1}^B drawn from distributions P(Dt)\mathbb P(\mathcal D_t), each defining a "task" when the distribution is fixed within contiguous segments. A finite memory buffer MM of size M|M| retains a subset of previously seen examples, with reservoir sampling for content refresh.

At each step tt, two batches are sampled: the current stream batch BtDcur\mathcal B_t \sim \mathcal D_{\rm cur} and a memory batch BtMM\mathcal B^M_t \sim M. The cumulative empirical loss is minimized with a single pass over incoming data, but both batches are used for training:

θθη[1Bt(x,y)BtL(fθ(x),y)+1BtM(x,y)BtML(fθ(x),y)]\theta \leftarrow \theta - \eta\left[ \frac{1}{|\mathcal B_t|} \sum_{(x, y) \in \mathcal B_t} \nabla \mathcal L(f_\theta(x), y) + \frac{1}{|\mathcal B^M_t|} \sum_{(x, y) \in \mathcal B^M_t} \nabla \mathcal L(f_\theta(x), y) \right]

This approach, known as experience replay (ER), presents challenges due to the small and dynamic memory’s inability to fully represent past data distributions.

2. Theoretical Perspective: Dynamic Empirical Risk and Memory Overfitting

Analysis of ER in the online setting reveals that memory-based updates perform gradient descent on a biased, dynamic empirical risk:

Rt(θ)=(x,y)DcurL(fθ(x),y)+βtλ(x,y)M0L(fθ(x),y)\mathcal R_t(\theta) = \sum_{(x, y) \in \mathcal D_{\rm cur}} \mathcal L(f_\theta(x), y) + \beta_t \lambda \sum_{(x, y) \in M^0} \mathcal L(f_\theta(x), y)

with

λ=DcurM0,βt=11+2Ncurt/Npast\lambda = \frac{|\mathcal D_{\rm cur}|}{|M^0|},\quad \beta_t = \frac{1}{1 + 2N_{\rm cur}^t / N_{\rm past}}

Here, λ\lambda reflects the task-to-memory ratio and βt\beta_t tracks the sequence progression; together, they modulate the weighting of memory loss versus streaming data. Notably, the weight βtλ\beta_t \lambda generally deviates from 1, introducing risk of overfitting to the buffer. As the stream progresses, βt\beta_t increases, but the relative scale λ\lambda remains, implicating persistent bias.

Multiple rehearsals (K>1K>1) of the same mini-batch exacerbate regularization issues, quickly driving the loss of Bt\mathcal B_t to zero while memory losses degrade slowly, often placing solutions on high-loss ridges for past task distributions.

3. Algorithmic Components of RAR

RAR introduces two main components at each step:

  • Multiple rehearsal iterations k=1,,Kk = 1, \ldots, K per incoming mini-batch.
  • Random data augmentation At,k\mathcal A_{t,k} (e.g., RandAugment) independently applied at each iteration.

The training protocol at time tt is:

1
2
3
4
5
6
7
8
for k in 1 ... K:
    sample memory batch B^M_{t,k}  M
    form joint batch B_{t,k} = B_t  B^M_{t,k}
    apply random augment g_{t,k}  sample from augmentation group G
    perform SGD update on loss:
        ℓ_RAR = _{(x, y)B_{t,k}} ℓ( f_θ( g_{t,k}(x) ), y )
end
update memory M with reservoir sampling

The instantaneous loss at iteration kk: L(k)(θ)=1Bt,k(x,y)Bt,k(fθ(At,k(x)),y)\mathcal L^{(k)}(\theta) = \frac{1}{|B_{t,k}|} \sum_{(x, y) \in B_{t,k}} \ell\bigl(f_\theta(\mathcal A_{t,k}(x)), y\bigr) and aggregated over KK: RAR(θ;(x,y))=k=1K(fθ(At,k(x)),y)\ell_{\mathrm{RAR}}(\theta; (x, y)) = \sum_{k=1}^K \ell\bigl(f_\theta(\mathcal A_{t,k}(x)), y\bigr)

4. Loss-Landscape Approximation and Ridge Aversion

By averaging memory loss across random, label-preserving transformations, RAR operationalizes training on the augmented empirical risk:

Rˉt(θ)=(x,y)DcurG(fθ(gx),y)dQ(g)+βtλ(x,y)M0G(fθ(gx),y)dQ(g)\bar{\mathcal R}_t(\theta) = \sum_{(x, y) \in \mathcal D_{\rm cur}} \int_G \ell(f_\theta(gx), y)\, d\mathbb Q(g) + \beta_t \lambda \sum_{(x, y) \in M^0} \int_G \ell(f_\theta(gx), y)\, d\mathbb Q(g)

This averaging, as shown (Proposition 3), reduces the variance and generalization error relative to static buffer rehearsal. Empirical visualization of the loss landscape reveals that RAR-aligned test-loss and memory-loss contours coincide; in particular, RAR solutions lie off the high-loss ridges that typify ER-based continual learning, indicating effective "ridge aversion."

5. Reinforcement Learning for Hyperparameter Adaptation

RAR incorporates two principal hyperparameters: the rehearsal repetition count KK and augmentation strength parameters (P,Q)(P, Q). Automated selection is formulated as an online RL (multi-armed bandit) problem using bootstrapped policy gradient (BPG):

  • State: stateless, with reset on each new task.
  • Actions: discrete configurations of (K,P,Q)(K, P, Q).
  • Reward: absolute deviation of memory accuracy AM(t)A_M(t) from a target AMA^*_M,

rt=AM(t)AMr_t = |A_M(t) - A^*_M|

Actions leading to accurate but not overfitted buffer performance are reinforced. The policy πw(a)expwa\pi_w(a) \propto \exp w_a partitions actions into "better" and "worse" sets, and the signed policy-gradient formula iteratively adapts the hyperparameters, converging rapidly in practice.

6. Empirical Evaluation

RAR was benchmarked on Seq-CIFAR100, Seq-Mini-ImageNet, CORE50-NC, and CLRS25-NC using fixed buffer sizes of 2K or 5K samples. Baseline ER reports end-of-sequence accuracies of approximately 19-24% (2K memory), while RAR (with fixed K=10,P=1,Q=14K=10, P=1, Q=14) achieves 28-39%—an absolute gain of 9–15%. When appended to advanced replay strategies MIR, ASER, SCR, RAR further boosts performance by 7–18%. Reweighted ER and distillation-based DER also show improvements up to 33%.

Ablation studies indicate that "repeats-only" or "augmented-only" variants offer inconsistent results contingent on the task-to-memory ratio λ\lambda; only their combination in RAR is robustly advantageous across all datasets. Stability-plasticity analysis reveals that increasing KK enhances plasticity while reducing stability, with augmentation compensating for the trade-off and shifting the Pareto frontier outward.

RL-based adaptation of hyperparameters (Table 3) outperforms online continual learning–hyperparameter tuning (OCL-HT) by 4–6 points during initial tasks and adapts smoothly to heightened overfitting risk as βt\beta_t increases.

7. Practical Guidance and Recommendations

The task-to-memory ratio λ\lambda informs hyperparameter selection:

  • Large λ\lambda (few memories per task): stronger augmentation QQ, smaller KK.
  • Small λ\lambda: moderate augmentation, larger KK.

Empirical recommendations for most vision continual learning applications are K[5,15]K \in [5, 15], P=1P=1 or $2$, Q[10,20]Q \in [10, 20], with RL-based hyperparameter tuning providing an effective alternative to manual grid search at modest computational cost (approximately 2×2\times training time). Target memory accuracy AM0.9A^*_M \approx 0.9 has proven effective for RL-based adaptation.

RAR functions as a modular augmentation for any rehearsal scheme, offering a principled remedy for the underfitting/overfitting trade-off and delivering consistent accuracy gains of 9–20% across replay algorithms and benchmarks (Zhang et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Holistic Proxy-based Contrastive Replay (HPCR).