Learning Robustness at Test-Time from a Non-Robust Teacher

Published 13 Apr 2026 in cs.CV | (2604.11590v1)

Abstract: Nowadays, pretrained models are increasingly used as general-purpose backbones and adapted at test-time to downstream environments where target data are scarce and unlabeled. While this paradigm has proven effective for improving clean accuracy on the target domain, adversarial robustness has received far less attention, especially when the original pretrained model is not explicitly designed to be robust. This raises a practical question: \emph{can a pretrained, non-robust model be adapted at test-time to improve adversarial robustness on a target distribution?} To face this question, this work studies how adversarial training strategies behave when integrated into adaptation schemes for the unsupervised test-time setting, where only a small set of unlabeled target samples is available. It first analyzes how classical adversarial training formulations can be extended to this scenario, showing that straightforward distillation-based adaptations remain unstable and highly sensitive to hyperparameter tuning, particularly when the teacher itself is non-robust. To address these limitations, the work proposes a label-free framework that uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during adaptation. We further provide theoretical insights showing that our formulation yields a more stable alternative to the self-consistency-based regularization commonly used in classical adversarial training. Experiments evaluate the proposed approach on CIFAR-10 and ImageNet under induced photometric transformations. The results support the theoretical insights by showing that the proposed approach achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than existing baselines in this post-deployment test-time setting.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces the Teacher-guided Robust Adaptation (TgRA) framework, leveraging a non-robust teacher to guide both clean and adversarial learning signals.
It demonstrates that aligning adversarial outputs with fixed teacher predictions stabilizes optimization and mitigates the moving target problem during test-time adaptation.
Empirical results on CIFAR-10 and ImageNet show TgRA substantially improves clean and robust accuracy compared to TRADES-U under distribution shifts.

Test-Time Robustness Adaptation from a Non-Robust Teacher

Problem Statement and Motivation

This work addresses the degradation of adversarial robustness under distribution shift for DNNs adapted at test time. The principal focus lies on unsupervised test-time adaptation (TTA), where a model, pretrained on a source distribution, is deployed in a target environment with access only to a small set of unlabeled samples. While TTA protocols can efficiently recover clean accuracy in the target domain, they rarely address—let alone preserve—adversarial robustness, especially when starting from non-robust backbones. The challenging question considered here is: Can robustness be improved via adaptation at test time when neither labels nor robust source models are available?

Technical Barriers

Classical adversarial training (AT) frameworks require label supervision, which is absent in the TTA scenario.
Robust distillation approaches assume a robust teacher, but in practical scenarios only non-robust models are available.
Direct adaptation of standard AT objectives (e.g., PGD, TRADES) to the unsupervised and non-robust teacher setting leads to unstable optimization, with high sensitivity to hyperparameters and weakened robustness-accuracy trade-offs.

Teacher-Guided Robust Adaptation (TgRA) Framework

To address these issues, the paper introduces Teacher-guided Robust Adaptation (TgRA), a label-free robustness adaptation paradigm that consistently uses a non-robust teacher as a fixed semantic anchor for both clean and adversarial learning signals.

After formalizing the target scenario—where only a non-robust pretrained model and unlabeled target samples are available—the paper specifies the two main routes to unsupervised test-time robust adaptation:

Student Self-consistency (e.g., TRADES-U): Replace the clean supervised loss with KL alignment against the teacher, but retain a standard TRADES self-consistency robustness regularizer (student output on clean vs. adversarial input).
Teacher-anchored Consistency (TgRA): Use the non-robust teacher as the reference for both the clean prediction loss and the robustness objective, i.e., the student’s prediction on adversarial examples aligns to the teacher’s prediction on clean examples.

The framework is summarized below.

Figure 1: The Teacher-guided Robust Adaptation (TgRA) framework adapts a student network at test time, using the non-robust teacher's predictions as targets for both clean and adversarial losses.

This approach differs from prior distillation-based adversarial training techniques [e.g. (Chen et al., 2021)] in two core aspects: (i) only non-robust teachers are available (realistic for TTA), and (ii) the teacher guides both objectives, eliminating the instability introduced by a moving self-referential student distribution.

Theoretical Analysis of Robustness Optimization Stability

A central contribution is the theoretical analysis of stability under different regularization strategies. Let $R_{\mathrm{self}}$ be the self-consistency robustness regularizer, and $R_{\mathrm{teach}}$ the teacher-anchored variant. The paper shows:

Self-consistency ( $R_{\mathrm{self}}$ ): Both input and reference distributions are parameter-dependent, creating a feedback loop where adversarial weight updates can destabilize clean predictions.
Teacher-anchoring ( $R_{\mathrm{teach}}$ ): The teacher output is fixed, making the gradient only depend on the adversarial branch and thus drastically reducing the moving-target effect.

This distinction has practical consequences:

The robustness-accuracy trade-off parameter $\beta$ introduces instability in TRADES-U, while in TgRA it primarily adjusts the trade-off without pronounced adverse effects.
Instability and performance collapse—particularly evident under non-robust initialization and large distribution shifts—are mitigated in TgRA due to the fixed reference.

Experimental Evaluation

The empirical evaluation spans CIFAR-10 and ImageNet under synthetic, controlled distribution shifts through photometric corruptions. Results are reported for varying adaptation set sizes, corruption severities, and $\beta$ values, and target both clean and robust accuracies under adaptive attacks (PGD-20, PGN, Square). Both WideResNet-34 and high-capacity ImageNet backbones (ResNet-50, ViT-B/16) are considered.

Robustness under Distribution Shift

Initial analyses show that even mild corruptions can significantly degrade adversarial robustness in standard pretrained models, while clean accuracy remains stable. This exposes a marked robustness-generalization gap:

Figure 3: On ImageNet, robust accuracy drops sharply under photometric shifts, while clean Top-1 accuracy is relatively invariant across corruption severities and $\epsilon$ budgets.

Numerical Results: CIFAR-10 and ImageNet

Extensive comparisons demonstrate the superiority of TgRA over the TRADES-U adaptation baseline:

Stability: Across all values of $\beta$ and severities, TgRA yields stable clean and robust accuracy curves, with minimal sensitivity to hyperparameters.
Performance: On CIFAR-10 with severe corruptions, TgRA outperforms TRADES-U by over 16 points in clean accuracy and 11 points in robust accuracy, demonstrating consistent advantages as the shift increases.
Data Scarcity Regime: When only limited target samples (e.g., 50% split of CIFAR-10 test set) are available, both baselines’ performance degrades, but that of TgRA remains considerably higher and less prone to collapse.

Figure 2: Training dynamics on CIFAR-10—TgRA maintains high clean and robust accuracy, especially under increased distribution shift and throughout the learning process.

Figure 4: On CIFAR-10, final clean and robust accuracy (PGD-20) demonstrate significant gains for TgRA as severity increases, compared to self-consistency adaptation.

ImageNet experiments show similar trends: teacher anchoring is robust to high photometric corruption severities and adversarial perturbation budgets, and performs at least on-par with, or surpasses, fully supervised robust fine-tuning with oracle labels.

Training Dynamics

Throughout adaptation, TgRA exhibits smoother optimization, rapid robustness recovery, and avoids the fast initial accuracy drop prevalent in TRADES-U. This is especially marked under non-zero distribution shift and non-robust initialization, confirming the theoretical arguments.

Practical and Theoretical Implications

This work demonstrates that robustness can be substantially improved at test time using only unlabeled target samples and non-robust supervision. The principled formulation of TgRA addresses the primary failure mode of prior test-time adversarial adaptation schemes—namely, the instability from self-consistency regularizers initialized from non-robust weights.

On the practical side:

This removes the necessity for robust source models, making adversarial robustness achievable in deployment-centric, real-world settings.
The resilience to hyperparameter tuning and data limitations increases viability in automation and safety-critical deployment—areas where labeled data is rare and distribution shift is common.

Theoretically, the moving target problem in robustness optimization is clarified and resolved; the separation between reference and adaptation branches proposed here could be adapted to other forms of post-deployment self-training.

Outlook and Future Directions

Advancing this framework opens several promising lines:

Enhanced teacher adaptation: More sophisticated or continual test-time adaptation strategies for the teacher could further amplify the effect, as teacher guidance quality directly impacts student robustness.
Continual and nonstationary environments: Extending to dynamic domains where the target distribution incessantly evolves calls for further analysis of anchor stability and adaptation efficacy.
Self-supervised and contrastive teacher signals: Exploring alternative label-free anchors for even greater generalization and adaptation flexibility in diverse modalities.

Conclusion

The paper establishes a new direction for adversarial robustness enhancement at test time, showing that with a simple and theoretically sound teacher-anchored adaptation objective, substantial clean and robust accuracy gains are possible, even from non-robust initialization and under significant distribution shift. These results have direct implications for robust deployment of machine learning systems in open-world and resource-limited settings, and lay groundwork for future research in robust, self-supervised, and continual post-deployment adaptation.

Reference:

"Learning Robustness at Test-Time from a Non-Robust Teacher" (2604.11590)

Markdown Report Issue