Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification (2307.03133v1)

Published 6 Jul 2023 in cs.LG and cs.CV

Abstract: Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. Given the need for robustness in neural network systems when faced with distribution shifts, numerous TTA methods have recently been proposed. However, evaluating these methods is often done under different settings, such as varying distribution shifts, backbones, and designing scenarios, leading to a lack of consistent and fair benchmarks to validate their effectiveness. To address this issue, we present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets: CIFAR-10-C, CIFAR-100-C, ImageNet-C, DomainNet, and Office-Home. These methods encompass a wide range of adaptation scenarios (e.g. online adaptation v.s. offline adaptation, instance adaptation v.s. batch adaptation v.s. domain adaptation). Furthermore, we explore the compatibility of different TTA methods with diverse network backbones. To implement this benchmark, we have developed a unified framework in PyTorch, which allows for consistent evaluation and comparison of the TTA methods across the different datasets and network architectures. By establishing this benchmark, we aim to provide researchers and practitioners with a reliable means of assessing and comparing the effectiveness of TTA methods in improving model robustness and generalization performance. Our code is available at https://github.com/yuyongcan/Benchmark-TTA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Revisiting test time adaptation under online evaluation, 2023.
  2. Parameter-free online test-time adaptation. In Proc. CVPR, 2022.
  3. Contrastive test-time adaptation. In Proc. CVPR, 2022.
  4. Improved test-time adaptation for domain generalization. In Proc. CVPR, 2023.
  5. Robustbench: a standardized adversarial robustness benchmark. In Proc. NeurIPS, 2021.
  6. Imagenet: A large-scale hierarchical image database. In Proc. CVPR, 2009.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. NAACL, 2019.
  8. Proxymix: Proxy-based mixup training with label refinery for source-free domain adaptation. arXiv preprint arXiv:2205.14566, 2022.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. ICLR, 2021.
  10. Sharpness-aware minimization for efficiently improving generalization. In Proc. ICLR, 2021.
  11. Decorate the newcomers: Visual domain prompt for continual test time adaptation. In Proc. AAAI, 2023.
  12. Deep reconstruction-classification networks for unsupervised domain adaptation. In Proc. ECCV, 2016.
  13. Semi-supervised learning by entropy minimization. Proc. NeurIPS, 2004.
  14. Momentum contrast for unsupervised visual representation learning. In Proc. CVPR, 2020.
  15. Deep residual learning for image recognition. In Proc. CVPR, 2016.
  16. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proc. ICCV, 2021.
  17. Benchmarking neural network robustness to common corruptions and perturbations. In Proc. ICLR, 2019.
  18. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  19. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. ICML, 2015.
  20. Test-time classifier adjustment module for model-agnostic domain generalization. In Proc. NeurIPS, 2021.
  21. 3d common corruptions and data augmentation. In Proc. CVPR, 2022.
  22. A Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, University of Tront, 2009.
  23. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 1989.
  24. Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proc. ICML, 2013.
  25. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proc. ACL, 2020.
  26. A comprehensive survey on test-time adaptation under distribution shifts. arXiv preprint arXiv:2303.15361, 2023.
  27. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In Proc. ICML, 2020.
  28. Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8602–8617, 2021.
  29. Guiding pseudo-labels with uncertainty estimation for test-time adaptation. In Proc. CVPR, 2023.
  30. Introducing intermediate domains for effective self-training during test-time. arXiv preprint arXiv:2208.07736, 2022.
  31. Evaluating prediction-time batch normalization for robustness under covariate shift. In Proc. ICMLW, 2020.
  32. Efficient test-time model adaptation without forgetting. In Proc. ICML, 2022.
  33. Towards stable test-time adaptation in dynamic wild world. In Proc. ICLR, 2023.
  34. Moment matching for multi-source domain adaptation. In Proc. ICCV, 2019.
  35. Semi-supervised domain adaptation via minimax entropy. In Proc. ICCV, 2019.
  36. Improving robustness against common corruptions by covariate shift adaptation. In Proc. NeurIPS, 2020.
  37. Mm-tta: multi-modal test-time adaptation for 3d semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16928–16937, 2022.
  38. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
  39. Addressing distribution shift at test time in pre-trained language models. arXiv preprint arXiv:2212.02384, 2022.
  40. Segmenter: Transformer for semantic segmentation. In Proc. CVPR, 2021.
  41. Deep hashing network for unsupervised domain adaptation. In Proc. CVPR, 2017.
  42. Tent: Fully test-time adaptation by entropy minimization. In Proc. ICLR, 2020.
  43. Continual test-time domain adaptation. In Proc. CVPR, 2022.
  44. Xi Wang and Laurence Aitchison. Out of distribution robustness with pre-trained bayesian neural networks. arXiv preprint arXiv:2206.12361, 2022.
  45. Source data-free cross-domain semantic segmentation: Align, teach and propagate. arXiv preprint arXiv:2106.11653, 2021.
  46. Segformer: Simple and efficient design for semantic segmentation with transformers. Proc. NeurIPS, 2021.
  47. Self-training with noisy student improves imagenet classification. In Proc. CVPR, 2020.
  48. Aggregated residual transformations for deep neural networks. In Proc. CVPR, 2017.
  49. Exploiting the intrinsic neighborhood structure for source-free domain adaptation. In Proc. NeurIPS, 2021.
  50. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  51. Memo: Test time robustness via adaptation and augmentation. In Proc. NeurIPS, 2022.
  52. Adaptive risk minimization: Learning to adapt to domain shift. Proc. NeurIPS, 2021.
  53. Deep transfer network: Unsupervised domain adaptation. arXiv preprint arXiv:1503.00591, 2015.
  54. On pitfalls of test-time adaptation. In Proc. ICML, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yongcan Yu (3 papers)
  2. Lijun Sheng (14 papers)
  3. Ran He (172 papers)
  4. Jian Liang (162 papers)
Citations (10)
Github Logo Streamline Icon: https://streamlinehq.com