Holistic Proxy-based Contrastive Replay

Updated 5 February 2026

HPCR is a holistic method integrating proxy-based contrastive mechanisms with multiple augmented rehearsals to address key challenges in online continual learning.
It systematically interleaves repeated augmentation steps and memory replay to accurately approximate the loss landscape while balancing stability and plasticity.
By using reinforcement learning for hyperparameter adaptation, HPCR achieves significant performance improvements across various continual learning benchmarks.

Repeated Augmented Rehearsal (RAR) is a rehearsal-based approach to online continual learning (OCL) designed to mitigate the underfitting–overfitting dilemma inherent in memory-constrained experience replay. RAR systematically interleaves multiple rehearsal iterations and online data augmentation at each training step to yield an improved empirical approximation of the loss landscape of previously encountered tasks. It further introduces reinforcement learning (RL)-based hyperparameter adaptation to dynamically balance stability and plasticity during training, achieving substantial improvements over baseline and state-of-the-art replay methods across a variety of benchmarks (Zhang et al., 2022).

1. Formalization of Online Continual Learning and Rehearsal Dynamics

In OCL, a model $f_\theta: \mathcal X \to \mathbb R^C$ observes data via a non-stationary stream of mini-batches $\mathcal B_t = \{(x_i, y_i)\}_{i=1}^B$ drawn from distributions $\mathbb P(\mathcal D_t)$ , each defining a "task" when the distribution is fixed within contiguous segments. A finite memory buffer $M$ of size $|M|$ retains a subset of previously seen examples, with reservoir sampling for content refresh.

At each step $t$ , two batches are sampled: the current stream batch $\mathcal B_t \sim \mathcal D_{\rm cur}$ and a memory batch $\mathcal B^M_t \sim M$ . The cumulative empirical loss is minimized with a single pass over incoming data, but both batches are used for training:

$\theta \leftarrow \theta - \eta\left[ \frac{1}{|\mathcal B_t|} \sum_{(x, y) \in \mathcal B_t} \nabla \mathcal L(f_\theta(x), y) + \frac{1}{|\mathcal B^M_t|} \sum_{(x, y) \in \mathcal B^M_t} \nabla \mathcal L(f_\theta(x), y) \right]$

This approach, known as experience replay (ER), presents challenges due to the small and dynamic memory’s inability to fully represent past data distributions.

2. Theoretical Perspective: Dynamic Empirical Risk and Memory Overfitting

Analysis of ER in the online setting reveals that memory-based updates perform gradient descent on a biased, dynamic empirical risk:

$\mathcal R_t(\theta) = \sum_{(x, y) \in \mathcal D_{\rm cur}} \mathcal L(f_\theta(x), y) + \beta_t \lambda \sum_{(x, y) \in M^0} \mathcal L(f_\theta(x), y)$

with

$\lambda = \frac{|\mathcal D_{\rm cur}|}{|M^0|},\quad \beta_t = \frac{1}{1 + 2N_{\rm cur}^t / N_{\rm past}}$

Here, $\lambda$ reflects the task-to-memory ratio and $\beta_t$ tracks the sequence progression; together, they modulate the weighting of memory loss versus streaming data. Notably, the weight $\beta_t \lambda$ generally deviates from 1, introducing risk of overfitting to the buffer. As the stream progresses, $\beta_t$ increases, but the relative scale $\lambda$ remains, implicating persistent bias.

Multiple rehearsals ( $K>1$ ) of the same mini-batch exacerbate regularization issues, quickly driving the loss of $\mathcal B_t$ to zero while memory losses degrade slowly, often placing solutions on high-loss ridges for past task distributions.

3. Algorithmic Components of RAR

RAR introduces two main components at each step:

Multiple rehearsal iterations $k = 1, \ldots, K$ per incoming mini-batch.
Random data augmentation $\mathcal A_{t,k}$ (e.g., RandAugment) independently applied at each iteration.

The training protocol at time $t$ is:

for k in 1 ... K:
    sample memory batch B^M_{t,k} ⊂ M
    form joint batch B_{t,k} = B_t ∪ B^M_{t,k}
    apply random augment g_{t,k} ← sample from augmentation group G
    perform SGD update on loss:
        ℓ_RAR = ∑_{(x, y)∈B_{t,k}} ℓ( f_θ( g_{t,k}(x) ), y )
end
update memory M with reservoir sampling

The instantaneous loss at iteration $k$ : $\mathcal L^{(k)}(\theta) = \frac{1}{|B_{t,k}|} \sum_{(x, y) \in B_{t,k}} \ell\bigl(f_\theta(\mathcal A_{t,k}(x)), y\bigr)$ and aggregated over $K$ : $\ell_{\mathrm{RAR}}(\theta; (x, y)) = \sum_{k=1}^K \ell\bigl(f_\theta(\mathcal A_{t,k}(x)), y\bigr)$

4. Loss-Landscape Approximation and Ridge Aversion

By averaging memory loss across random, label-preserving transformations, RAR operationalizes training on the augmented empirical risk:

$\bar{\mathcal R}_t(\theta) = \sum_{(x, y) \in \mathcal D_{\rm cur}} \int_G \ell(f_\theta(gx), y)\, d\mathbb Q(g) + \beta_t \lambda \sum_{(x, y) \in M^0} \int_G \ell(f_\theta(gx), y)\, d\mathbb Q(g)$

This averaging, as shown (Proposition 3), reduces the variance and generalization error relative to static buffer rehearsal. Empirical visualization of the loss landscape reveals that RAR-aligned test-loss and memory-loss contours coincide; in particular, RAR solutions lie off the high-loss ridges that typify ER-based continual learning, indicating effective "ridge aversion."

5. Reinforcement Learning for Hyperparameter Adaptation

RAR incorporates two principal hyperparameters: the rehearsal repetition count $K$ and augmentation strength parameters $(P, Q)$ . Automated selection is formulated as an online RL (multi-armed bandit) problem using bootstrapped policy gradient (BPG):

State: stateless, with reset on each new task.
Actions: discrete configurations of $(K, P, Q)$ .
Reward: absolute deviation of memory accuracy $A_M(t)$ from a target $A^*_M$ ,

$r_t = |A_M(t) - A^*_M|$

Actions leading to accurate but not overfitted buffer performance are reinforced. The policy $\pi_w(a) \propto \exp w_a$ partitions actions into "better" and "worse" sets, and the signed policy-gradient formula iteratively adapts the hyperparameters, converging rapidly in practice.

6. Empirical Evaluation

RAR was benchmarked on Seq-CIFAR100, Seq-Mini-ImageNet, CORE50-NC, and CLRS25-NC using fixed buffer sizes of 2K or 5K samples. Baseline ER reports end-of-sequence accuracies of approximately 19-24% (2K memory), while RAR (with fixed $K=10, P=1, Q=14$ ) achieves 28-39%—an absolute gain of 9–15%. When appended to advanced replay strategies MIR, ASER, SCR, RAR further boosts performance by 7–18%. Reweighted ER and distillation-based DER also show improvements up to 33%.

Ablation studies indicate that "repeats-only" or "augmented-only" variants offer inconsistent results contingent on the task-to-memory ratio $\lambda$ ; only their combination in RAR is robustly advantageous across all datasets. Stability-plasticity analysis reveals that increasing $K$ enhances plasticity while reducing stability, with augmentation compensating for the trade-off and shifting the Pareto frontier outward.

RL-based adaptation of hyperparameters (Table 3) outperforms online continual learning–hyperparameter tuning (OCL-HT) by 4–6 points during initial tasks and adapts smoothly to heightened overfitting risk as $\beta_t$ increases.

7. Practical Guidance and Recommendations

The task-to-memory ratio $\lambda$ informs hyperparameter selection:

Large $\lambda$ (few memories per task): stronger augmentation $Q$ , smaller $K$ .
Small $\lambda$ : moderate augmentation, larger $K$ .

Empirical recommendations for most vision continual learning applications are $K \in [5, 15]$ , $P=1$ or $2$, $Q \in [10, 20]$ , with RL-based hyperparameter tuning providing an effective alternative to manual grid search at modest computational cost (approximately $2\times$ training time). Target memory accuracy $A^*_M \approx 0.9$ has proven effective for RL-based adaptation.

RAR functions as a modular augmentation for any rehearsal scheme, offering a principled remedy for the underfitting/overfitting trade-off and delivering consistent accuracy gains of 9–20% across replay algorithms and benchmarks (Zhang et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Holistic Proxy-based Contrastive Replay (HPCR).