Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Test-Time Model Adaptation with Only Forward Passes (2404.01650v2)

Published 2 Apr 2024 in cs.LG

Abstract: Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method. In FOA, we seek to solely learn a newly added prompt (as model's input) via a derivative-free covariance matrix adaptation evolution strategy. To make this strategy work stably under our online unsupervised setting, we devise a novel fitness function by measuring test-training statistic discrepancy and model prediction entropy. Moreover, we design an activation shifting scheme that directly tunes the model activations for shifted test samples, making them align with the source training domain, thereby further enhancing adaptation performance. Without using any backpropagation and altering model weights, FOA runs on quantized 8-bit ViT outperforms gradient-based TENT on full-precision 32-bit ViT, while achieving an up to 24-fold memory reduction on ImageNet-C.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Mt3: Meta test-time training for self-supervised test-time adaption. In International Conference on Artificial Intelligence and Statistics, pp.  3080–3090. PMLR, 2022.
  2. Confidence-based out-of-distribution detection: A comparative study and analysis. In Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Perinatal Imaging, Placental and Preterm Image Analysis, pp.  122–132. Springer, 2021.
  3. Simplify your covariance matrix adaptation evolution strategy. IEEE Transactions on Evolutionary Computation, 21(5):746–759, 2017.
  4. Parameter-free online test-time adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  8344–8353, 2022.
  5. Vitality: Unifying low-rank and sparse approximation for vision transformer acceleration with a linear taylor attention. In IEEE International Symposium on High-Performance Computer Architecture, pp.  415–428, 2023.
  6. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009.
  7. Efficient test-time adaptation for super-resolution with second-order degradation and reconstruction. In Advances in Neural Information Processing Systems, 2024.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  9. Domain generalization via model-agnostic learning of semantic features. In Advances in Neural Information Processing Systems, pp.  6447–6458, 2019.
  10. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015.
  11. Fleuret, F. et al. Test time adaptation through perturbation robustness. In Advances in Neural Information Processing Systems Workshop, 2021.
  12. Test-time training with masked autoencoders. In Advances in Neural Information Processing Systems, volume 35, pp.  29374–29385, 2022.
  13. Back to the source: Diffusion-driven test-time adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  14. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations, 2018.
  15. Test time adaptation via conjugate pseudo-labels. In Advances in Neural Information Processing Systems, volume 35, pp.  6204–6218, 2022.
  16. Hansen, N. The cma evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772, 2016.
  17. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001.
  18. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary Computation, 11(1):1–18, 2003.
  19. Mmes: Mixture model-based evolution strategy for large-scale optimization. IEEE Transactions on Evolutionary Computation, 25(2):320–333, 2020.
  20. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2019.
  21. Augmix: A simple data processing method to improve robustness and uncertainty. In International Conference on Learning Representations, 2020.
  22. The many faces of robustness: A critical analysis of out-of-distribution generalization. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  8340–8349, 2021.
  23. MECTA: Memory-economic continual test-time model adaptation. In International Conference on Learning Representations, 2023.
  24. Mixnorm: Test-time adaptation through online normalization estimation. arXiv preprint arXiv:2110.11478, 2021.
  25. Test-time classifier adjustment module for model-agnostic domain generalization. In Advances in Neural Information Processing Systems, volume 34, 2021.
  26. Query complexity of derivative-free optimization. In Advances in Neural Information Processing Systems, volume 25, 2012.
  27. Sita: Single image test-time adaptation. arXiv preprint arXiv:2112.02355, 2021.
  28. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp.  5637–5664, 2021.
  29. Emergent world representations: Exploring a sequence model trained on a synthetic task. In International Conference on Learning Representations, 2023a.
  30. Inference-time intervention: Eliciting truthful answers from a language model. In Advances in Neural Information Processing Systems, 2023b.
  31. A comprehensive survey on test-time adaptation under distribution shifts. arXiv preprint arXiv:2303.15361, 2023.
  32. TTN: A domain-shift aware batch normalization in test-time adaptation. In International Conference on Learning Representations, 2023.
  33. Video test-time adaptation for action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  22952–22961, 2023.
  34. Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, volume 34, 2021a.
  35. Post-training quantization for vision transformer. In Advances in Neural Information Processing Systems, volume 34, pp.  28092–28103, 2021b.
  36. Large scale black-box optimization by limited-memory matrix adaptation. IEEE Transactions on Evolutionary Computation, 23(2):353–358, 2018.
  37. Relaxed quantization for discretized neural networks. In International Conference on Learning Representations, 2019.
  38. Fine-tuning language models with just forward passes. In Advances in Neural Information Processing Systems, 2023.
  39. Actmad: Activation matching to align distributions for test-time-training. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  24152–24161, 2023.
  40. Evaluating prediction-time batch normalization for robustness under covariate shift. arXiv preprint arXiv:2006.10963, 2020.
  41. Obtaining well calibrated probabilities using bayesian binning. In AAAI Conference on Artificial Intelligence, volume 29, 2015.
  42. Efficient test-time model adaptation without forgetting. In International Conference on Machine Learning, pp.  16888–16905, 2022.
  43. Towards stable test-time adaptation in dynamic wild world. In International Conference on Learning Representations, 2023.
  44. Source-free domain adaptation via avatar prototype generation and adaptation. In International Joint Conference on Artificial Intelligence, 2021.
  45. Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, pp.  5389–5400, 2019.
  46. Maximum classifier discrepancy for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  3723–3732, 2018.
  47. Improving robustness against common corruptions by covariate shift adaptation. In Advances in Neural Information Processing Systems, volume 33, pp.  11539–11551, 2020.
  48. Shamir, O. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. The Journal of Machine Learning Research, 18(1):1703–1713, 2017.
  49. Generalizing across domains via cross-gradient training. In International Conference on Learning Representations, 2018.
  50. Test-time prompt tuning for zero-shot generalization in vision-language models. In Advances in Neural Information Processing Systems, volume 35, pp.  14274–14289, 2022.
  51. Extracting latent steering vectors from pretrained language models. In Findings of the Association for Computational Linguistics, 2022.
  52. Black-box tuning for language-model-as-a-service. In International Conference on Machine Learning, pp.  20841–20855. PMLR, 2022.
  53. Test-time training with self-supervision for generalization under distribution shifts. In International Conference on Machine Learning, pp.  9229–9248, 2020.
  54. Activation addition: Steering language models without optimization. arXiv preprint arXiv:2308.10248, 2023.
  55. Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations, 2021.
  56. Learning robust global representations by penalizing local predictive power. In Advances in Neural Information Processing Systems, pp.  10506–10518, 2019.
  57. Continual test-time domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  58. Test-time model adaptation for visual question answering with debiased self-supervisions. IEEE Transactions on Multimedia, 2023.
  59. Wightman, R. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
  60. Improving out-of-distribution robustness via selective augmentation. In International Conference on Machine Learning, 2022.
  61. Vitcod: Vision transformer acceleration via dedicated algorithm and accelerator co-design. In IEEE International Symposium on High-Performance Computer Architecture, pp.  273–286, 2023.
  62. Black-box prompt tuning for vision-language model as a service. In International Joint Conference on Artificial Intelligence, pp.  1686–1694, 2023.
  63. Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization. In European Conference on Computer Vision, pp.  191–207. Springer, 2022.
  64. Exploring motion cues for video test-time adaptation. In Proceedings of the 31st ACM International Conference on Multimedia, pp.  1840–1850, 2023.
  65. Memo: Test time robustness via adaptation and augmentation. In Advances in Neural Information Processing Systems, 2022.
  66. Collaborative unsupervised domain adaptation for medical image diagnosis. IEEE Transactions on Image Processing, 29:7834–7844, 2020.
  67. DELTA: Degradation-free fully test-time adaptation. In International Conference on Learning Representations, 2023.
Citations (9)

Summary

  • The paper proposes the Forward-Only Adaptation (FOA) method that leverages CMA-ES to update a prompt using only forward passes.
  • It introduces a novel fitness function combining training-statistic discrepancy with prediction entropy for robust online adaptation.
  • Experimental results show FOA outperforms gradient-based methods on resource-constrained devices, achieving substantial memory and computation savings.

Overview of "Test-Time Model Adaptation with Only Forward Passes"

The paper, "Test-Time Model Adaptation with Only Forward Passes," addresses the critical issue of adapting deep neural networks at the test-time to cope with distribution shifts, without leveraging backward propagation. This innovation is particularly pertinent for deployment scenarios involving resource-limited devices like FPGAs and quantized models, where backpropagation is infeasible due to hardware constraints.

Methodology

The core contribution of the paper lies in the introduction of the Forward-Only Adaptation (FOA) method. This method tackles the test-time adaptation problem by focusing exclusively on forward passes, leveraging a derivative-free optimization technique known as covariance matrix adaptation evolution strategy (CMA-ES). The FOA method circumvents traditional, computationally-intensive backpropagation by updating only a newly introduced prompt—maintaining a constant model structure and eliminating the need to modify existing model weights.

To achieve efficient and stable adaptation, the authors design an innovative fitness function that combines test-training statistic discrepancy with model prediction entropy. This design ensures adaptability even under the constraints of an online unsupervised setting. Additionally, to enhance adaptation performance further, FOA implements an "activation shifting" scheme that aligns the activations of test samples with that of the source training domain.

Results

The paper presents compelling numerical results to substantiate the efficacy of the FOA. On ImageNet-C, the FOA applied to an 8-bit quantized Vision Transformer (ViT) outperformed traditional gradient-based methods, such as Tent, applied to 32-bit models, demonstrating superior performance while achieving up to a 24-fold memory reduction. These results present a significant advancement in memory and computation-efficient model adaptation, making FOA particularly suitable for applications on edge devices.

Implications and Future Directions

Practically, this research has profound implications for deploying machine learning models in resource-constrained environments. The elimination of backpropagation not only reduces memory footprint and computation but also potentially enhances data privacy by negating the need for cloud-based computations. Theoretically, the use of derivative-free optimization for real-time model adaptation broadens the landscape of test-time adaptation strategies, paving the way for future research in optimizing these components for even more complex and high-dimensional model structures.

The authors acknowledge multiple avenues for future research. These include refining the CMA-ES strategy to handle higher-dimensional problem spaces more effectively and exploring other derivative-free optimization methods or hybrid strategies that merge the merits of both gradient-based and forward-only approaches. Additionally, applying the FOA framework to other types of models beyond vision transformers could test its generalizability and adaptability across various domains.

In conclusion, the test-time adaptation technique proposed in this paper offers a pragmatic and efficient solution for enhancing model robustness against distribution shifts, especially for scenarios where computational resources are severely constrained. The intersection of derivative-free optimization with model adaptation opens up an exciting frontier that merits further exploration within the AI community.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com