Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Decoupled Prototype Learning for Reliable Test-Time Adaptation (2401.08703v2)

Published 15 Jan 2024 in cs.LG

Abstract: Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. However, its performance is significantly affected by noisy pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. To address this issue, we propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation. First, we decouple the optimization of class prototypes. For each class prototype, we reduce its distance with positive samples and enlarge its distance with negative samples in a contrastive manner. This strategy prevents the model from overfitting to noisy pseudo-labels. Second, we propose a memory-based strategy to enhance DPL's robustness for the small batch sizes often encountered in TTA. We update each class's pseudo-feature from a memory in a momentum manner and insert an additional DPL loss. Finally, we introduce a consistency regularization-based approach to leverage samples with unconfident pseudo-labels. This approach transfers feature styles of samples with unconfident pseudo-labels to those with confident pseudo-labels. Thus, more reliable samples for TTA are created. The experimental results demonstrate that our methods achieve state-of-the-art performance on domain generalization benchmarks, and reliably improve the performance of self-training-based methods on image corruption benchmarks. The code will be released.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,” in Proc. Int. Conf. Learn. Represent., 2020.
  2. Y. Iwasawa and Y. Matsuo, “Test-time classifier adjustment module for model-agnostic domain generalization,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 2427–2440.
  3. S. Choi, S. Yang, S. Choi, and S. Yun, “Improving test-time adaptation via shift-agnostic weight regularization and nearest source prototypes,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 440–458.
  4. Z. Chi, Y. Wang, Y. Yu, and J. Tang, “Test-time fast adaptation for dynamic scene deblurring via meta-auxiliary learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9137–9146.
  5. D. Chen, D. Wang, T. Darrell, and S. Ebrahimi, “Contrastive test-time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 295–305.
  6. Q. Wang, O. Fink, L. Van Gool, and D. Dai, “Continual test-time domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 7201–7211.
  7. M. Boudiaf, R. Mueller, I. Ben Ayed, and L. Bertinetto, “Parameter-free online test-time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8344–8353.
  8. S. Niu, J. Wu, Y. Zhang, Y. Chen, S. Zheng, P. Zhao, and M. Tan, “Efficient test-time model adaptation without forgetting,” in Proc. Int. Conf. Mach. Learn., 2022, pp. 16 888–16 905.
  9. X. Zhao, C. Liu, A. Sicilia, S. J. Hwang, and Y. Fu, “Test-time fourier style calibration for domain generalization,” in Proc. Int. Joint Conf. Artif. Intell., 2022, pp. 1721–1727.
  10. Z. Wen, S. Niu, G. Li, Q. Wu, M. Tan, and Q. Wu, “Test-time model adaptation for visual question answering with debiased self-supervisions,” IEEE Trans. Multimedia, 2023, doi:10.1109/TMM.2023.3292597.
  11. D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Deeper, broader and artier domain generalization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5542–5550.
  12. D.-H. Lee et al., “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Proc. Int. Conf. Mach. Learn. Workshop, 2013.
  13. J. Gao, J. Zhang, X. Liu, T. Darrell, E. Shelhamer, and D. Wang, “Back to the source: Diffusion-driven test-time corruption,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 11 786–11 796.
  14. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1501–1510.
  15. K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 596–608.
  16. C. Fang, Y. Xu, and D. N. Rockmore, “Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 1657–1664.
  17. H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5018–5027.
  18. S. Beery, G. Van Horn, and P. Perona, “Recognition in terra incognita,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 456–473.
  19. K. Saito, D. Kim, S. Sclaroff, T. Darrell, and K. Saenko, “Semi-supervised domain adaptation via minimax entropy,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 8050–8058.
  20. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep., 2009.
  21. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1126–1135.
  22. L. Jing and Y. Tian, “Self-supervised visual feature learning with deep neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 4037–4058, 2020.
  23. T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 5824–5836.
  24. D. Li, J. Zhang, Y. Yang, C. Liu, Y.-Z. Song, and T. M. Hospedales, “Episodic training for domain generalization,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1446–1455.
  25. Y. Zhao, Z. Zhong, F. Yang, Z. Luo, Y. Lin, S. Li, and N. Sebe, “Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 6277–6286.
  26. F. M. Carlucci, A. D’Innocente, S. Bucci, B. Caputo, and T. Tommasi, “Domain generalization by solving jigsaw puzzles,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2229–2238.
  27. Y. Shi, J. Seely, P. Torr, N. Siddharth, A. Hannun, N. Usunier, and G. Synnaeve, “Gradient matching for domain generalization,” in Proc. Int. Conf. Learn. Represent., 2021.
  28. L. Mansilla, R. Echeveste, D. H. Milone, and E. Ferrante, “Domain generalization via gradient surgery,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 6630–6638.
  29. Z. Huang, H. Wang, E. P. Xing, and D. Huang, “Self-challenging improves cross-domain generalization,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 124–140.
  30. K. Zhou, Y. Yang, Y. Qiao, and T. Xiang, “Domain generalization with mixstyle,” in Proc. Int. Conf. Learn. Represent., 2020.
  31. Q. Xu, R. Zhang, Y. Zhang, Y. Wang, and Q. Tian, “A fourier-based framework for domain generalization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 383–14 392.
  32. X. Li, Y. Dai, Y. Ge, J. Liu, Y. Shan, and L.-Y. Duan, “Uncertainty modeling for out-of-distribution generalization,” in Proc. Int. Conf. Learn. Represent., 2022.
  33. K. Zhou, Y. Yang, T. Hospedales, and T. Xiang, “Learning to generate novel domains for domain generalization,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 561–578.
  34. W. Tan, C. Ding, P. Wang, M. Gong, and K. Jia, “Style interleaved learning for generalizable person re-identification,” IEEE Trans. Multimedia, 2023, doi:10.1109/TMM.2023.3283878.
  35. C. Chen, L. Tang, F. Liu, G. Zhao, Y. Huang, and Y. Yu, “Mix and reason: Reasoning over semantic topology with data mixing for domain generalization,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 33 302–33 315.
  36. K. Muandet, D. Balduzzi, and B. Schölkopf, “Domain generalization via invariant feature representation,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 10–18.
  37. S. Erfani, M. Baktashmotlagh, M. Moshtaghi, X. Nguyen, C. Leckie, J. Bailey, and R. Kotagiri, “Robust domain generalisation by enforcing distribution invariance,” in Proc. Int. Joint Conf. Artif. Intell., 2016, pp. 1455–1461.
  38. Y. Li, M. Gong, X. Tian, T. Liu, and D. Tao, “Domain generalization via conditional invariant representations,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018.
  39. B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 443–450.
  40. H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5400–5409.
  41. P. Zhang, H. Dou, Y. Yu, and X. Li, “Adaptive cross-domain learning for generalizable person re-identification,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 215–232.
  42. H. Zhang, Y.-F. Zhang, W. Liu, A. Weller, B. Schölkopf, and E. P. Xing, “Towards principled disentanglement for domain generalization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8024–8034.
  43. X. Jin, C. Lan, W. Zeng, and Z. Chen, “Style normalization and restitution for domain generalization and adaptation,” IEEE Trans. Multimedia, vol. 24, pp. 3636–3651, 2021.
  44. Z. Niu, J. Yuan, X. Ma, Y. Xu, J. Liu, Y.-W. Chen, R. Tong, and L. Lin, “Knowledge distillation-based domain-invariant representation learning for domain generalization,” IEEE Trans. Multimedia, 2023, doi: 10.1109/TMM.2023.3263549.
  45. S. Yang, Y. Wang, J. van de Weijer, L. Herranz, and S. Jui, “Generalized source-free domain adaptation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 8978–8987.
  46. J. Zhuo, S. Wang, and Q. Huang, “Uncertainty modeling for robust domain adaptation under noisy environments,” IEEE Trans. Multimedia, vol. 25, pp. 6157–6170, 2022.
  47. W. Deng, L. Zhao, Q. Liao, D. Guo, G. Kuang, D. Hu, M. Pietikäinen, and L. Liu, “Informative feature disentanglement for unsupervised domain adaptation,” IEEE Trans. Multimedia, vol. 24, pp. 2407–2421, 2021.
  48. P. Wang, C. Ding, W. Tan, M. Gong, K. Jia, and D. Tao, “Uncertainty-aware clustering for unsupervised domain adaptive object re-identification,” IEEE Trans. Multimedia, vol. 25, pp. 2624–2635, 2023.
  49. Y. Sun, X. Wang, Z. Liu, J. Miller, A. Efros, and M. Hardt, “Test-time training with self-supervision for generalization under distribution shifts,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 9229–9248.
  50. Y. Liu, P. Kothari, B. van Delft, B. Bellot-Gurlet, T. Mordan, and A. Alahi, “Ttt++: When does self-supervised test-time training fail or thrive?” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 21 808–21 820.
  51. H. Ni, J. Song, X. Luo, F. Zheng, W. Li, and H. T. Shen, “Meta distribution alignment for generalizable person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2487–2496.
  52. H. Liu, Z. Wu, L. Li, S. Salehkalaibar, J. Chen, and K. Wang, “Towards multi-domain single image dehazing via test-time training,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 5831–5840.
  53. J. Liang, D. Hu, and J. Feng, “Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 6028–6039.
  54. S. Goyal, M. Sun, A. Raghunathan, and Z. Kolter, “Test-time adaptation via conjugate pseudo-labels,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 6204–6218.
  55. M. Jang and S.-Y. Chung, “Test-time adaptation via self-training with nearest neighbor information,” in Proc. Int. Conf. Learn. Represent., 2023.
  56. S. Wang, D. Zhang, Z. Yan, J. Zhang, and R. Li, “Feature alignment and uniformity for test time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 20 050–20 060.
  57. C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1321–1330.
  58. R. A. Marsden, M. Döbler, and B. Yang, “Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction,” in IEEE Wint. Conf. Appl. Comput. Vis., 2023, pp. 2555–2565.
  59. J. Liang, R. He, and T. Tan, “A comprehensive survey on test-time adaptation under distribution shifts,” arXiv preprint arXiv:2303.15361, 2023.
  60. S. Singh and A. Shrivastava, “Evalnorm: Estimating batch normalization statistics for evaluation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 3633–3641.
  61. S. Ioffe, “Revisiting batch normalization for practical domain adaptation,” in Proc. Int. Conf. Learn. Represent., 2017.
  62. C. Burns and J. Steinhardt, “Limitations of post-hoc feature alignment for robustness,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 2525–2533.
  63. H. Lim, B. Kim, J. Choo, and S. Choi, “Ttn: A domain-shift aware batch normalization in test-time adaptation,” in Proc. Int. Conf. Learn. Represent., 2023.
  64. F. You, J. Li, and Z. Zhao, “Test-time batch statistics calibration for covariate shift,” arXiv preprint arXiv:2110.04065, 2021.
  65. X. Hu, G. Uzunbas, S. Chen, R. Wang, A. Shah, R. Nevatia, and S.-N. Lim, “Mixnorm: Test-time adaptation through online normalization estimation,” arXiv preprint arXiv:2110.11478, 2021.
  66. S. Schneider, E. Rusak, L. Eck, O. Bringmann, W. Brendel, and M. Bethge, “Improving robustness against common corruptions by covariate shift adaptation,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 11 539–11 551.
  67. J. Hong, L. Lyu, J. Zhou, and M. Spranger, “Mecta: Memory-economic continual test-time model adaptation,” in Proc. Int. Conf. Learn. Represent., 2023.
  68. T. Gong, J. Jeong, T. Kim, Y. Kim, J. Shin, and S.-J. Lee, “Note: Robust continual test-time adaptation against temporal correlation,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 27 253–27 266.
  69. L. Yuan, B. Xie, and S. Li, “Robust test-time adaptation in dynamic scenarios,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 15 922–15 932.
  70. B. Zhao, C. Chen, and S.-T. Xia, “Delta: degradation-free fully test-time adaptation,” in Proc. Int. Conf. Learn. Represent., 2023.
  71. Y. Zou, Z. Zhang, C.-L. Li, H. Zhang, T. Pfister, and J.-B. Huang, “Learning instance-specific adaptation for cross-domain segmentation,” in Proc. Eur. Conf. Comput. Vis., 2023, pp. 459–476.
  72. M. J. Mirza, J. Micorek, H. Possegger, and H. Bischof, “The norm must go on: Dynamic unsupervised domain adaptation by normalization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 14 765–14 775.
  73. S. Niu, J. Wu, Y. Zhang, Z. Wen, Y. Chen, P. Zhao, and M. Tan, “Towards stable test-time adaptation in dynamic wild world,” Proc. Int. Conf. Learn. Represent., 2023.
  74. Y. Wu and K. He, “Group normalization,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 3–19.
  75. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
  76. Y. Zhang, X. Wang, K. Jin, K. Yuan, Z. Zhang, L. Wang, R. Jin, and T. Tan, “Adanpc: Exploring non-parametric classifier for test-time adaptation,” in Proc. Int. Conf. Mach. Learn., 2023, pp. 41 647–41 676.
  77. D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” in Proc. Int. Conf. Learn. Represent., 2019.
  78. L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 594–611, 2006.
  79. B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “Labelme: a database and web-based tool for image annotation,” Int. J. Comput. Vis., vol. 77, pp. 157–173, 2008.
  80. M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky, “Exploiting hierarchical context on a large database of object categories,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2010.
  81. M. Everingham and J. Winn, “The pascal visual object classes challenge 2007 (voc2007) development kit,” Proc. IEEE Int. Conf. Comput. Vis., pp. 129–136, 2010.
  82. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
  83. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
  84. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent., 2021.
  85. I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit et al., “Mlp-mixer: An all-mlp architecture for vision,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 24 261–24 272.
  86. S. Zagoruyko and N. Komodakis, “Wide residual networks,” in Proc. Brit. Mach. Vis. Conf., 2016.
  87. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1492–1500.
  88. I. Gulrajani and D. Lopez-Paz, “In search of lost domain generalization,” arXiv preprint arXiv:2007.01434, 2020.
  89. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent., 2015.
  90. Y. Li, N. Wang, J. Shi, X. Hou, and J. Liu, “Adaptive batch normalization for practical domain adaptation,” Pattern Recognit., vol. 80, pp. 109–117, 2018.
  91. J. Zhang, L. Qi, Y. Shi, and Y. Gao, “Domainadaptor: A novel approach to test-time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 18 971–18 981.
  92. M. Döbler, R. A. Marsden, and B. Yang, “Robust mean teacher for continual and gradual test-time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7704–7714.
  93. J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4690–4699.
  94. Y. Movshovitz-Attias, A. Toshev, T. K. Leung, S. Ioffe, and S. Singh, “No fuss distance metric learning using proxies,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 360–368.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets