Prediction-Balanced Reservoir Sampling
- The paper introduces PBRS as a novel method for maintaining a prediction-balanced buffer to enable robust continual test-time adaptation on non-i.i.d. streams.
- PBRS employs prediction-balanced insertion and class-conditioned reservoir sampling to effectively mitigate overfitting and class imbalance in dynamic environments.
- Empirical evaluations demonstrate that PBRS outperforms prior TTA methods by significantly lowering error rates across benchmarks such as CIFAR10-C and ImageNet-C.
Prediction-Balanced Reservoir Sampling (PBRS) is an algorithmic method designed to mitigate overfitting and improve generalization in continual test-time adaptation (TTA) under temporally correlated (non-i.i.d.) data streams. Emerging from the NEED-robust Online TESt-time-adaptation (NOTE) framework, PBRS systematically maintains a small, temporal memory buffer of test samples that reflect a class-balanced, nearly i.i.d.-like subsample of the non-i.i.d. test stream, as inferred from the model’s own predictions. This enables robust adaptation of normalization statistics in the presence of severe class-imbalance and temporal correlation (Gong et al., 2022).
1. Motivation and Problem Setting
Continual TTA presupposes a model’s operation under distribution shift, adapting on-the-fly using only the incoming stream of unlabeled test data. Many extant TTA algorithms rely on batch statistics (e.g., for recalibration of BatchNorm), or entropy minimization on each batch. These methods tend to overfit, particularly when the data stream exhibits non-i.i.d. behavior—such as temporally correlated sequences common in real-world scenarios—resulting in class-imbalance and model bias toward transient majority classes. PBRS was introduced to address this by simulating an i.i.d. adaptation buffer through prediction- and time-balanced sample selection (Gong et al., 2022).
2. Algorithmic Structure of PBRS
PBRS maintains a fixed-capacity reservoir of size , where denotes a test sample and its current model-predicted class. For each new test point with predicted label , PBRS applies two interleaved update mechanisms:
- Prediction-Balanced Insertion: If is underrepresented in relative to other predicted classes, PBRS uniformly selects an instance belonging to the majority (over-represented) class in and replaces it with .
- Class-Conditioned Reservoir Sampling: If is not underrepresented, reservoir sampling is performed within class . For class , the replacement probability for a newly seen sample is , where is the count of class in and is the cumulative count of class encountered.
This buffer replacement is performed online with per-sample overhead.
3. Mathematical Formulation and Buffer Dynamics
Let denote the running total of test samples with model-predicted class up to time , and the count of class instances in . The insertion rules are as follows:
- For an incoming test example with :
- Buffer filling phase: If , append .
- Prediction-balanced replacement: If is a minority class in , replace a randomly chosen sample with majority label.
- Class-conditioned sampling: If not, with probability , replace a randomly chosen buffer sample with label .
Mathematically:
4. Integration with Continual Test-time Adaptation
PBRS operates in tandem with Instance-Aware Batch Normalization (IABN). After every insertions, the buffer is used to recompute BatchNorm statistics and update the affine parameters via a single backward adaptation pass. The global mean and variance are updated using exponential moving averages:
with momentum . Here, and are calculated from activations of the buffered samples. Only are optimized through Adam (learning rate ). The buffer size matches a common mini-batch size.
5. Empirical Performance Evaluation
PBRS, solely in conjunction with IABN, was evaluated across multiple benchmarks reflecting severe temporal class-imbalance and real data streams. In the non-i.i.d. setting, mean error rates achieved by NOTE (IABN + PBRS) were:
| Dataset | NOTE (IABN+PBRS) | Best Prior Baseline |
|---|---|---|
| CIFAR10-C | 21.1% | 36.2% (LAME) |
| CIFAR100-C | 47.0% | 63.3% (LAME) |
| ImageNet-C | 80.6% | 82.7% |
| KITTI-Rain | 10.9% | 11.3% |
| HARTH | 51.0% | 61.0% |
| ExtraSensory | 45.4% | 50.7% |
NOTE outperforms all other TTA methods (BN-Stats, ONDA, PL, TENT, LAME, CoTTA) in non-i.i.d. streams, and matches or surpasses them when the i.i.d. assumption holds (Gong et al., 2022).
6. Theoretical and Empirical Properties
PBRS does not offer formal unbiasedness proofs but empirically maintains class frequencies in the buffer close to the long-term average as predicted by the model. The class-conditioned reservoir sampling ensures per-class time-uniform sampling, and the prediction-balanced policy prevents domination by any class under severe drift. Ablation studies demonstrate near-uniform class distribution in even under pronounced temporal skews, underpinning robust adaptation dynamics (Gong et al., 2022).
7. Implementation Details and Practical Considerations
Key parameters include buffer size , BatchNorm EMA momentum , and Adam learning rate for adaptation. Storage requirements are minimal: only pairs, with computational cost per sample for buffer management, and per adaptation step (triggered every samples). PBRS is always paired with IABN for maximum robustness in the non-i.i.d. paradigm (Gong et al., 2022).