Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Stable Test-Time Adaptation in Dynamic Wild World (2302.12400v1)

Published 24 Feb 2023 in cs.LG and cs.CV

Abstract: Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. However, the online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world. Specifically, TTA may fail to improve or even harm the model performance when test data have: 1) mixed distribution shifts, 2) small batch sizes, and 3) online imbalanced label distribution shifts, which are quite common in practice. In this paper, we investigate the unstable reasons and find that the batch norm layer is a crucial factor hindering TTA stability. Conversely, TTA can perform more stably with batch-agnostic norm layers, \ie, group or layer norm. However, we observe that TTA with group and layer norms does not always succeed and still suffers many failure cases. By digging into the failure cases, we find that certain noisy test samples with large gradients may disturb the model adaption and result in collapsed trivial solutions, \ie, assigning the same class label for all samples. To address the above collapse issue, we propose a sharpness-aware and reliable entropy minimization method, called SAR, for further stabilizing TTA from two aspects: 1) remove partial noisy samples with large gradients, 2) encourage model weights to go to a flat minimum so that the model is robust to the remaining noisy samples. Promising results demonstrate that SAR performs more stably over prior methods and is computationally efficient under the above wild test scenarios.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shuaicheng Niu (23 papers)
  2. Jiaxiang Wu (27 papers)
  3. Yifan Zhang (245 papers)
  4. Zhiquan Wen (3 papers)
  5. Yaofo Chen (14 papers)
  6. Peilin Zhao (127 papers)
  7. Mingkui Tan (124 papers)
Citations (205)

Summary

An Overview of Stability in Test-Time Adaptation

The paper "Towards Stable Test-time Adaptation" addresses the critical issue of instability in Test-Time Adaptation (TTA) when models are deployed in real-world scenarios characterized by distribution shifts between training and testing data. TTA methods, which update models online using incoming test samples, encounter instability primarily due to reliance on batch normalization (BN) layers. This paper thoroughly scrutinizes the causes of such instability, particularly when faced with mixed distribution shifts, small batch sizes, and imbalances in label distributions.

The research identifies the batch normalization layer as a significant hurdle in achieving stability in TTA. BN layers fail to provide robust model adaptation when batch sizes are small or when distribution shifts are mixed, causing inaccurate estimation of mean and variance. The authors propose that utilizing batch-agnostic normalization layers, such as group norm (GN) and layer norm (LN), could enhance stability as they do not depend on batch statistics.

Empirical studies in the paper reveal that models equipped with GN and LN layers demonstrate increased robustness compared to their BN counterparts, although challenges remain. The research finds that test-time entropy minimization, a common method in TTA, often leads to model collapse, particularly under severe distribution shifts. This collapse manifests as the model predicting a single class for all samples.

To counteract these challenges, the authors propose a novel method called Sharpness-Aware and Reliable Entropy Minimization (SAR). SAR incorporates two key strategies: selective filtering of samples to remove outliers with large gradient norms and sharpness-aware learning to encourage the model to reach flatter minima. Flattening the entropy loss surface enhances the model's resilience to noisy gradients, allowing for more stable adaptations.

The paper presents strong numerical results, demonstrating SAR's superiority over existing methods such as Tent, MEMO, and DDA across various wild test scenarios, including online imbalanced label shifts. SAR significantly improves accuracy while maintaining computational efficiency.

The theoretical implications of this research are noteworthy. By shifting the paradigm of test-time adaptation from batch dependency to a batch-agnostic approach, it challenges the traditional reliance on batch statistics for domain adaptation. Practically, this opens new avenues for deploying AI models in dynamic environments, such as autonomous vehicles or real-time data analysis, where robust model adaptation is critical.

Future developments could explore improving the efficiency of sharpness-aware learning and further enhancing model resilience in more complex real-world distributions. The open-source release of SAR potentially enables broader adoption and adaptation, fostering further research in robust machine learning applications.

In summary, this paper provides valuable insights and methodologies for enhancing TTA stability, proposing substantial shifts in existing adaptation techniques and offering new directions in leveraging batch-agnostic normalization for robust AI model deployment.