ATAC: Test-time Adversarial Correction

Updated 28 November 2025

The paper introduces a novel test-time correction that exploits augmentation-induced drift to recover performance under adversarial attacks.
It computes the mean drift and employs angular consistency to adjust the CLIP embeddings, maintaining robustness with minimal computational overhead.
Empirical results show a dramatic improvement in robust accuracy (up to 80.94%) with only a slight drop in clean accuracy, validating ATAC's effectiveness.

Augmentation-based Test-time Adversarial Correction (ATAC) constitutes a framework of test-time defenses designed to recover the performance of pretrained deep learning models in the presence of adversarial or distributionally-shifted inputs by leveraging augmentation-induced effects. These schemes intervene exclusively during inference, require no retraining or access to original training data, and infer correction updates by analyzing how model representations respond to label-preserving input augmentations. Recent work has instantiated ATAC methods for both vision-LLMs, notably CLIP, as well as for streaming self-learning settings with adaptive augmentation, significantly improving adversarial robustness with modest computational overhead (Su et al., 21 Nov 2025, Tomar et al., 2023).

1. Problem Setting and Motivation

ATAC methods address the vulnerability of large-scale pretrained models to adversarial perturbations and distribution shift at test time. In high-profile cases such as CLIP, small $\ell_\infty$ attacks can reduce zero-shot image classification accuracy to below 0.1%, with retraining or adversarial fine-tuning being computationally prohibitive and often harming generalization. Instead, ATAC frameworks intervene at inference by correcting representations or decisions of the frozen model, exploiting the model's latent response to structured augmentations of the input (Su et al., 21 Nov 2025).

Key motivation for ATAC includes:

Maintaining test-time adaptability with no additional training or source data requirements.
Achieving state-of-the-art adversarial robustness while incurring only minor computational and accuracy overhead.
Providing a generic defense side-stepping the limitations of prior adversarial retraining or lightweight defenses.

2. Core Methodology: Embedding Drift and Correction in CLIP

The ATAC framework for CLIP (Su et al., 21 Nov 2025) is centered around analyzing the augmentation-induced drift in the model's embedding space. Let $x$ denote the input image, potentially adversarial, and let $\{A_i\}_{i=1}^n$ be a set of label-preserving image augmentations (horizontal flip, small rotations). For each augmented input $x_i = A_i(x)$ , CLIP's normalized image encoder outputs $f_x = E_I(x), f_{x_i} = E_I(x_i)$ .

The drift vector for each augmentation is defined as

$d_i = f_x - f_{x_i}$

where $d_i$ measures how the embedding shifts under augmentation. For clean images, these drifts are scattered; for adversarially perturbed images, drifts align in a coherent direction.

The mean drift is

$\bar d = \frac{1}{n} \sum_{i=1}^n d_i$

and the angular consistency

$\tau = \frac{1}{n} \sum_{i=1}^n \cos(d_i, \bar d)$

approximates 1 for adversarial (aligned) examples and is low for clean (scattered). When $\tau > \tau^*$ , a threshold, $\bar d$ is interpreted as a semantic recovery direction; the corrected embedding is

$f^* = \mathrm{norm}(f_x + \alpha \bar d)$

with step size $\alpha$ . If $\tau \le \tau^*$ , no correction is applied and $f^* = f_x$ .

This corrected embedding $f^*$ is then used for the CLIP zero-shot classifier, replacing the original $f_x$ .

3. Algorithmic Details and Computational Considerations

The ATAC correction for a single input proceeds as follows (Su et al., 21 Nov 2025):

Compute $f_x = E_I(x)$ .
For each $i = 1, \ldots, n$ $i = 1, \dots, n$ :
- Generate augmented input $x_i = A_i(x)$ .
- Compute $f_{x_i} = E_I(x_i)$ .
- Compute $d_i = f_x - f_{x_i}$ .
Calculate mean drift $\bar d$ and angular consistency $\tau$ .
If $\tau > \tau^*$ , produce $f^* = \mathrm{norm}(f_x + \alpha \bar d)$ ; else $f^*=f_x$ .
Classify with $\arg\max_i \cos(f^*, t_i)$ over label embeddings $\{t_i\}$ .

Typical hyperparameters:

$n=5$ augmentations (e.g., flip, $\pm15^\circ$ , $\pm30^\circ$ rotations)
$\alpha=7$
$\tau^*=0.85$

The computational cost is $n+1$ CLIP forward passes per example (≈18 ms per image on CLIP-ViT-B/32). ATAC requires neither fine-tuning nor access to labeled training data.

4. Empirical Results and Robustness Analysis

Experimental evaluation across 13 zero-shot classification benchmarks, including CIFAR-10, CIFAR-100, Caltech101/256, OxfordPets, Flowers102, STL-10, and others, demonstrates that ATAC achieves a substantial increase in robust accuracy under adversarial PGD( $\ell_\infty, \epsilon=4/255$ ) attacks:

Method	Avg Robust Acc. (%)	Clean Acc. Drop
CLIP	0.11	—
CLIP-FT	1.17	—
TeCoA	8.54	—
PMG-AFT	9.87	—
FARE	4.14	—
TTE	8.63	—
TTC	19.74	—
R-TPT	33.05	—
ATAC	80.94	2–5 pts

Clean accuracy with ATAC decreases by 2–5 percentage points, consistent with the robustness–accuracy trade-off.

ATAC further demonstrates resilience in extreme threat models:

Large-budget PGD ( $\epsilon=127.5/255$ ): 88.74%.
Early-stopped PGD: 54.13%.
Unsupervised PGD: 37.51%.
Targeted PGD: 35.44%.

Adaptive attacks designed to trigger or avoid the correction yield robust accuracies of 7.2% and 24.3%, respectively, while requiring substantially more attack iterations than prior methods (Su et al., 21 Nov 2025).

5. Relation to Other Augmentation-Based Test-Time Defenses

In test-time self-learning systems, adaptation can be driven by automatic adversarial augmentation, as exemplified by TeSLA (Tomar et al., 2023). TeSLA learns a distribution over augmentation policies (sub-sequences of image operations), selecting those that maximize prediction entropy, thereby pushing model features to the decision boundary. The feature distribution is regularized with a multi-layer mean feature penalty to maintain semantic content.

In TeSLA's ATAC module, for each test batch:

Sample augmentation policies for each instance.
Update policy parameters by minimizing an adversarial objective via online policy gradients.
Generate "hard" augmented examples and perform knowledge distillation from a teacher model (EMA of the student).
The main optimization targets mutual information between predictions and inputs, with a "flipped" cross-entropy loss and marginal entropy maximization, ensuring adaptation remains well-calibrated and insensitive to the architecture or source pretraining (Tomar et al., 2023).

TeSLA exhibits consistent state-of-the-art performance in classification and segmentation benchmarks under corruption and domain shift, and also improves calibration properties, as measured by Brier score, NLL, and expected calibration error.

6. Limitations and Scope

While ATAC for CLIP is training-free, efficient, and yields dramatic robustness gains, it is predicated on the assumption that chosen augmentations are label-preserving. Severe image corruptions or mismatched augmentation domains can undermine the angular consistency gating and degrade effectiveness. The current ATAC instantiation is tailored to CLIP's joint embedding setup; direct extension to unrelated architectures or dense prediction tasks requires further investigation (Su et al., 21 Nov 2025). In the TeSLA framework, there is sensitivity to the design of augmentation policy search and regularization, and real-time efficiency is achieved only through judicious architectural and algorithmic choices (Tomar et al., 2023).

7. Research Directions and Future Prospects

Potential avenues for advancing ATAC include:

Integration with lightweight fine-tuning or adaptive data purification to achieve stronger or provable guarantees.
Extending the latent-drift inference paradigm to other vision-language or multi-modal models.
Exploring theoretically-grounded links between augmentation-based drift geometry and adversarial vulnerability.
Systematic paper of augmentation-robustness across domains where semantic label preservation is ambiguous.

Sustained empirical and theoretical investigation will likely solidify ATAC as a key paradigm in the landscape of test-time adversarial correction and adaptive robustness (Su et al., 21 Nov 2025, Tomar et al., 2023).