Decoupled Prototype Learning for Reliable Test-Time Adaptation (2401.08703v2)
Abstract: Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. However, its performance is significantly affected by noisy pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. To address this issue, we propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation. First, we decouple the optimization of class prototypes. For each class prototype, we reduce its distance with positive samples and enlarge its distance with negative samples in a contrastive manner. This strategy prevents the model from overfitting to noisy pseudo-labels. Second, we propose a memory-based strategy to enhance DPL's robustness for the small batch sizes often encountered in TTA. We update each class's pseudo-feature from a memory in a momentum manner and insert an additional DPL loss. Finally, we introduce a consistency regularization-based approach to leverage samples with unconfident pseudo-labels. This approach transfers feature styles of samples with unconfident pseudo-labels to those with confident pseudo-labels. Thus, more reliable samples for TTA are created. The experimental results demonstrate that our methods achieve state-of-the-art performance on domain generalization benchmarks, and reliably improve the performance of self-training-based methods on image corruption benchmarks. The code will be released.
- D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,” in Proc. Int. Conf. Learn. Represent., 2020.
- Y. Iwasawa and Y. Matsuo, “Test-time classifier adjustment module for model-agnostic domain generalization,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 2427–2440.
- S. Choi, S. Yang, S. Choi, and S. Yun, “Improving test-time adaptation via shift-agnostic weight regularization and nearest source prototypes,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 440–458.
- Z. Chi, Y. Wang, Y. Yu, and J. Tang, “Test-time fast adaptation for dynamic scene deblurring via meta-auxiliary learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9137–9146.
- D. Chen, D. Wang, T. Darrell, and S. Ebrahimi, “Contrastive test-time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 295–305.
- Q. Wang, O. Fink, L. Van Gool, and D. Dai, “Continual test-time domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 7201–7211.
- M. Boudiaf, R. Mueller, I. Ben Ayed, and L. Bertinetto, “Parameter-free online test-time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8344–8353.
- S. Niu, J. Wu, Y. Zhang, Y. Chen, S. Zheng, P. Zhao, and M. Tan, “Efficient test-time model adaptation without forgetting,” in Proc. Int. Conf. Mach. Learn., 2022, pp. 16 888–16 905.
- X. Zhao, C. Liu, A. Sicilia, S. J. Hwang, and Y. Fu, “Test-time fourier style calibration for domain generalization,” in Proc. Int. Joint Conf. Artif. Intell., 2022, pp. 1721–1727.
- Z. Wen, S. Niu, G. Li, Q. Wu, M. Tan, and Q. Wu, “Test-time model adaptation for visual question answering with debiased self-supervisions,” IEEE Trans. Multimedia, 2023, doi:10.1109/TMM.2023.3292597.
- D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Deeper, broader and artier domain generalization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5542–5550.
- D.-H. Lee et al., “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Proc. Int. Conf. Mach. Learn. Workshop, 2013.
- J. Gao, J. Zhang, X. Liu, T. Darrell, E. Shelhamer, and D. Wang, “Back to the source: Diffusion-driven test-time corruption,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 11 786–11 796.
- X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1501–1510.
- K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 596–608.
- C. Fang, Y. Xu, and D. N. Rockmore, “Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 1657–1664.
- H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5018–5027.
- S. Beery, G. Van Horn, and P. Perona, “Recognition in terra incognita,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 456–473.
- K. Saito, D. Kim, S. Sclaroff, T. Darrell, and K. Saenko, “Semi-supervised domain adaptation via minimax entropy,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 8050–8058.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep., 2009.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1126–1135.
- L. Jing and Y. Tian, “Self-supervised visual feature learning with deep neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 4037–4058, 2020.
- T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 5824–5836.
- D. Li, J. Zhang, Y. Yang, C. Liu, Y.-Z. Song, and T. M. Hospedales, “Episodic training for domain generalization,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1446–1455.
- Y. Zhao, Z. Zhong, F. Yang, Z. Luo, Y. Lin, S. Li, and N. Sebe, “Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 6277–6286.
- F. M. Carlucci, A. D’Innocente, S. Bucci, B. Caputo, and T. Tommasi, “Domain generalization by solving jigsaw puzzles,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2229–2238.
- Y. Shi, J. Seely, P. Torr, N. Siddharth, A. Hannun, N. Usunier, and G. Synnaeve, “Gradient matching for domain generalization,” in Proc. Int. Conf. Learn. Represent., 2021.
- L. Mansilla, R. Echeveste, D. H. Milone, and E. Ferrante, “Domain generalization via gradient surgery,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 6630–6638.
- Z. Huang, H. Wang, E. P. Xing, and D. Huang, “Self-challenging improves cross-domain generalization,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 124–140.
- K. Zhou, Y. Yang, Y. Qiao, and T. Xiang, “Domain generalization with mixstyle,” in Proc. Int. Conf. Learn. Represent., 2020.
- Q. Xu, R. Zhang, Y. Zhang, Y. Wang, and Q. Tian, “A fourier-based framework for domain generalization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 383–14 392.
- X. Li, Y. Dai, Y. Ge, J. Liu, Y. Shan, and L.-Y. Duan, “Uncertainty modeling for out-of-distribution generalization,” in Proc. Int. Conf. Learn. Represent., 2022.
- K. Zhou, Y. Yang, T. Hospedales, and T. Xiang, “Learning to generate novel domains for domain generalization,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 561–578.
- W. Tan, C. Ding, P. Wang, M. Gong, and K. Jia, “Style interleaved learning for generalizable person re-identification,” IEEE Trans. Multimedia, 2023, doi:10.1109/TMM.2023.3283878.
- C. Chen, L. Tang, F. Liu, G. Zhao, Y. Huang, and Y. Yu, “Mix and reason: Reasoning over semantic topology with data mixing for domain generalization,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 33 302–33 315.
- K. Muandet, D. Balduzzi, and B. Schölkopf, “Domain generalization via invariant feature representation,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 10–18.
- S. Erfani, M. Baktashmotlagh, M. Moshtaghi, X. Nguyen, C. Leckie, J. Bailey, and R. Kotagiri, “Robust domain generalisation by enforcing distribution invariance,” in Proc. Int. Joint Conf. Artif. Intell., 2016, pp. 1455–1461.
- Y. Li, M. Gong, X. Tian, T. Liu, and D. Tao, “Domain generalization via conditional invariant representations,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018.
- B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 443–450.
- H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5400–5409.
- P. Zhang, H. Dou, Y. Yu, and X. Li, “Adaptive cross-domain learning for generalizable person re-identification,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 215–232.
- H. Zhang, Y.-F. Zhang, W. Liu, A. Weller, B. Schölkopf, and E. P. Xing, “Towards principled disentanglement for domain generalization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8024–8034.
- X. Jin, C. Lan, W. Zeng, and Z. Chen, “Style normalization and restitution for domain generalization and adaptation,” IEEE Trans. Multimedia, vol. 24, pp. 3636–3651, 2021.
- Z. Niu, J. Yuan, X. Ma, Y. Xu, J. Liu, Y.-W. Chen, R. Tong, and L. Lin, “Knowledge distillation-based domain-invariant representation learning for domain generalization,” IEEE Trans. Multimedia, 2023, doi: 10.1109/TMM.2023.3263549.
- S. Yang, Y. Wang, J. van de Weijer, L. Herranz, and S. Jui, “Generalized source-free domain adaptation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 8978–8987.
- J. Zhuo, S. Wang, and Q. Huang, “Uncertainty modeling for robust domain adaptation under noisy environments,” IEEE Trans. Multimedia, vol. 25, pp. 6157–6170, 2022.
- W. Deng, L. Zhao, Q. Liao, D. Guo, G. Kuang, D. Hu, M. Pietikäinen, and L. Liu, “Informative feature disentanglement for unsupervised domain adaptation,” IEEE Trans. Multimedia, vol. 24, pp. 2407–2421, 2021.
- P. Wang, C. Ding, W. Tan, M. Gong, K. Jia, and D. Tao, “Uncertainty-aware clustering for unsupervised domain adaptive object re-identification,” IEEE Trans. Multimedia, vol. 25, pp. 2624–2635, 2023.
- Y. Sun, X. Wang, Z. Liu, J. Miller, A. Efros, and M. Hardt, “Test-time training with self-supervision for generalization under distribution shifts,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 9229–9248.
- Y. Liu, P. Kothari, B. van Delft, B. Bellot-Gurlet, T. Mordan, and A. Alahi, “Ttt++: When does self-supervised test-time training fail or thrive?” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 21 808–21 820.
- H. Ni, J. Song, X. Luo, F. Zheng, W. Li, and H. T. Shen, “Meta distribution alignment for generalizable person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2487–2496.
- H. Liu, Z. Wu, L. Li, S. Salehkalaibar, J. Chen, and K. Wang, “Towards multi-domain single image dehazing via test-time training,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 5831–5840.
- J. Liang, D. Hu, and J. Feng, “Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 6028–6039.
- S. Goyal, M. Sun, A. Raghunathan, and Z. Kolter, “Test-time adaptation via conjugate pseudo-labels,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 6204–6218.
- M. Jang and S.-Y. Chung, “Test-time adaptation via self-training with nearest neighbor information,” in Proc. Int. Conf. Learn. Represent., 2023.
- S. Wang, D. Zhang, Z. Yan, J. Zhang, and R. Li, “Feature alignment and uniformity for test time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 20 050–20 060.
- C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1321–1330.
- R. A. Marsden, M. Döbler, and B. Yang, “Universal test-time adaptation through weight ensembling, diversity weighting, and prior correction,” in IEEE Wint. Conf. Appl. Comput. Vis., 2023, pp. 2555–2565.
- J. Liang, R. He, and T. Tan, “A comprehensive survey on test-time adaptation under distribution shifts,” arXiv preprint arXiv:2303.15361, 2023.
- S. Singh and A. Shrivastava, “Evalnorm: Estimating batch normalization statistics for evaluation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 3633–3641.
- S. Ioffe, “Revisiting batch normalization for practical domain adaptation,” in Proc. Int. Conf. Learn. Represent., 2017.
- C. Burns and J. Steinhardt, “Limitations of post-hoc feature alignment for robustness,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 2525–2533.
- H. Lim, B. Kim, J. Choo, and S. Choi, “Ttn: A domain-shift aware batch normalization in test-time adaptation,” in Proc. Int. Conf. Learn. Represent., 2023.
- F. You, J. Li, and Z. Zhao, “Test-time batch statistics calibration for covariate shift,” arXiv preprint arXiv:2110.04065, 2021.
- X. Hu, G. Uzunbas, S. Chen, R. Wang, A. Shah, R. Nevatia, and S.-N. Lim, “Mixnorm: Test-time adaptation through online normalization estimation,” arXiv preprint arXiv:2110.11478, 2021.
- S. Schneider, E. Rusak, L. Eck, O. Bringmann, W. Brendel, and M. Bethge, “Improving robustness against common corruptions by covariate shift adaptation,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 11 539–11 551.
- J. Hong, L. Lyu, J. Zhou, and M. Spranger, “Mecta: Memory-economic continual test-time model adaptation,” in Proc. Int. Conf. Learn. Represent., 2023.
- T. Gong, J. Jeong, T. Kim, Y. Kim, J. Shin, and S.-J. Lee, “Note: Robust continual test-time adaptation against temporal correlation,” in Proc. Adv. Neural Inf. Process. Syst., 2022, pp. 27 253–27 266.
- L. Yuan, B. Xie, and S. Li, “Robust test-time adaptation in dynamic scenarios,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 15 922–15 932.
- B. Zhao, C. Chen, and S.-T. Xia, “Delta: degradation-free fully test-time adaptation,” in Proc. Int. Conf. Learn. Represent., 2023.
- Y. Zou, Z. Zhang, C.-L. Li, H. Zhang, T. Pfister, and J.-B. Huang, “Learning instance-specific adaptation for cross-domain segmentation,” in Proc. Eur. Conf. Comput. Vis., 2023, pp. 459–476.
- M. J. Mirza, J. Micorek, H. Possegger, and H. Bischof, “The norm must go on: Dynamic unsupervised domain adaptation by normalization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 14 765–14 775.
- S. Niu, J. Wu, Y. Zhang, Z. Wen, Y. Chen, P. Zhao, and M. Tan, “Towards stable test-time adaptation in dynamic wild world,” Proc. Int. Conf. Learn. Represent., 2023.
- Y. Wu and K. He, “Group normalization,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 3–19.
- J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
- Y. Zhang, X. Wang, K. Jin, K. Yuan, Z. Zhang, L. Wang, R. Jin, and T. Tan, “Adanpc: Exploring non-parametric classifier for test-time adaptation,” in Proc. Int. Conf. Mach. Learn., 2023, pp. 41 647–41 676.
- D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” in Proc. Int. Conf. Learn. Represent., 2019.
- L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 594–611, 2006.
- B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “Labelme: a database and web-based tool for image annotation,” Int. J. Comput. Vis., vol. 77, pp. 157–173, 2008.
- M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky, “Exploiting hierarchical context on a large database of object categories,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2010.
- M. Everingham and J. Winn, “The pascal visual object classes challenge 2007 (voc2007) development kit,” Proc. IEEE Int. Conf. Comput. Vis., pp. 129–136, 2010.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent., 2021.
- I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit et al., “Mlp-mixer: An all-mlp architecture for vision,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 24 261–24 272.
- S. Zagoruyko and N. Komodakis, “Wide residual networks,” in Proc. Brit. Mach. Vis. Conf., 2016.
- S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1492–1500.
- I. Gulrajani and D. Lopez-Paz, “In search of lost domain generalization,” arXiv preprint arXiv:2007.01434, 2020.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent., 2015.
- Y. Li, N. Wang, J. Shi, X. Hou, and J. Liu, “Adaptive batch normalization for practical domain adaptation,” Pattern Recognit., vol. 80, pp. 109–117, 2018.
- J. Zhang, L. Qi, Y. Shi, and Y. Gao, “Domainadaptor: A novel approach to test-time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 18 971–18 981.
- M. Döbler, R. A. Marsden, and B. Yang, “Robust mean teacher for continual and gradual test-time adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7704–7714.
- J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4690–4699.
- Y. Movshovitz-Attias, A. Toshev, T. K. Leung, S. Ioffe, and S. Singh, “No fuss distance metric learning using proxies,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 360–368.