Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SETA: Semantic-Aware Token Augmentation for Domain Generalization (2403.11792v2)

Published 18 Mar 2024 in cs.CV

Abstract: Domain generalization (DG) aims to enhance the model robustness against domain shifts without accessing target domains. A prevalent category of methods for DG is data augmentation, which focuses on generating virtual samples to simulate domain shifts. However, existing augmentation techniques in DG are mainly tailored for convolutional neural networks (CNNs), with limited exploration in token-based architectures, i.e., vision transformer (ViT) and multi-layer perceptrons (MLP) models. In this paper, we study the impact of prior CNN-based augmentation methods on token-based models, revealing their performance is suboptimal due to the lack of incentivizing the model to learn holistic shape information. To tackle the issue, we propose the SEmantic-aware Token Augmentation (SETA) method. SETA transforms token features by perturbing local edge cues while preserving global shape features, thereby enhancing the model learning of shape information. To further enhance the generalization ability of the model, we introduce two stylized variants of our method combined with two state-of-the-art style augmentation methods in DG. We provide a theoretical insight into our method, demonstrating its effectiveness in reducing the generalization risk bound. Comprehensive experiments on five benchmarks prove that our method achieves SOTA performances across various ViT and MLP architectures. Our code is available at https://github.com/lingeringlight/SETA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” in ICCV, 2023.
  2. C. Zhao, Y. Wang, X. Jiang, Y. Shen, K. Song, D. Li, and D. Miao, “Learning domain invariant prompt for vision-language models,” TIP, 2024.
  3. S. Yan, N. Dong, L. Zhang, and J. Tang, “Clip-driven fine-grained text-image person re-identification,” TIP, 2023.
  4. S. J. Pan and Q. Yang, “A survey on transfer learning,” TKDE, 2009.
  5. S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Machine learning, 2010.
  6. K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” TPAMI, 2022.
  7. B. Gholami, M. El-Khamy, and K.-B. Song, “Latent feature disentanglement for visual domain generalization,” TIP, 2023.
  8. J. Guo, L. Qi, and Y. Shi, “Domaindrop: Suppressing domain-sensitive channels for domain generalization,” in ICCV, 2023.
  9. X. Wang, J. Zhang, L. Qi, and Y. Shi, “Generalizable decision boundaries: Dualistic meta-learning for open set domain generalization,” in ICCV, 2023.
  10. L. Zhang, Z. Liu, W. Zhang, and D. Zhang, “Style uncertainty based self-paced meta learning for generalizable person re-identification,” TIP, 2023.
  11. C. Li, D. Zhang, W. Huang, and J. Zhang, “Cross contrasting feature perturbation for domain generalization,” in ICCV, 2023.
  12. M. Wang, Y. Liu, J. Yuan, S. Wang, Z. Wang, and W. Wang, “Inter-class and inter-domain semantic augmentation for domain generalization,” TIP, 2024.
  13. J. Guo, N. Wang, L. Qi, and Y. Shi, “Aloft: A lightweight mlp-like architecture with dynamic low-frequency transform for domain generalization,” in CVPR, 2023.
  14. X. Li, Y. Dai, Y. Ge, J. Liu, Y. Shan, and L. Duan, “Uncertainty modeling for out-of-distribution generalization,” in ICLR, 2022.
  15. J. Bai, L. Yuan, S.-T. Xia, S. Yan, Z. Li, and W. Liu, “Improving vision transformers by revisiting high-frequency components,” in ECCV, 2022.
  16. S. Fridovich-Keil, R. Gontijo Lopes, and R. Roelofs, “Spectral bias in practice: The role of function frequency in generalization,” in NeurIPS, 2022.
  17. N. Baker, H. Lu, G. Erlikhman, and P. J. Kellman, “Deep convolutional networks do not classify based on global object shape,” PLoS computational biology, 2018.
  18. Y. Rao, W. Zhao, Z. Zhu, J. Lu, and J. Zhou, “Global filter networks for image classification,” in NeurIPS, 2021.
  19. K. Zhou, Y. Yang, Y. Qiao, and T. Xiang, “Mixstyle neural networks for domain generalization and adaptation,” IJCV, 2023.
  20. S. Lee, J. Bae, and H. Y. Kim, “Decompose, adjust, compose: Effective normalization by playing with frequency for domain generalization,” in CVPR, 2023.
  21. J. Ding, N. Xue, G.-S. Xia, B. Schiele, and D. Dai, “Hgformer: Hierarchical grouping transformer for domain generalized semantic segmentation,” in CVPR, 2023.
  22. B. Ren, Y. Liu, Y. Song, W. Bi, R. Cucchiara, N. Sebe, and W. Wang, “Masked jigsaw puzzle: A versatile position embedding for vision transformers,” in CVPR, 2023.
  23. A. Tripathi, R. Singh, A. Chakraborty, and P. Shenoy, “Edges to shapes to concepts: Adversarial augmentation for robust vision,” in CVPR, 2023.
  24. X. He, Q. Lin, C. Luo, W. Xie, S. Song, F. Liu, and L. Shen, “Shift from texture-bias to shape-bias: Edge deformation-based augmentation for robust object recognition,” in ICCV, 2023.
  25. P. Wang, Z. Zhang, Z. Lei, and L. Zhang, “Sharpness-aware gradient matching for domain generalization,” in CVPR, 2023.
  26. S. Hemati, G. Zhang, A. Estiri, and X. Chen, “Understanding hessian alignment for domain generalization,” in ICCV, 2023.
  27. J. Chen, Z. Gao, X. Wu, and J. Luo, “Meta-causal learning for single domain generalization,” in CVPR, 2023.
  28. L. Qi, H. Yang, Y. Shi, and X. Geng, “Normaug: Normalization-guided augmentation for domain generalization,” TIP, 2024.
  29. X. Jiang, J. Huang, S. Jin, and S. Lu, “Domain generalization via balancing training difficulty and model capability,” in ICCV, 2023.
  30. J. Zhang, L. Qi, Y. Shi, and Y. Gao, “Mvdg: A unified multi-view framework for domain generalization,” in ECCV, 2022.
  31. D. Arpit, H. Wang, Y. Zhou, and C. Xiong, “Ensemble of averages: Improving model selection and boosting performance in domain generalization,” in NeurIPS, 2022.
  32. X. Zhang, R. Xu, H. Yu, Y. Dong, P. Tian, and P. Cui, “Flatness-aware minimization for domain generalization,” in ICCV, 2023.
  33. Y. Zhang, M. Li, R. Li, K. Jia, and L. Zhang, “Exact feature distribution matching for arbitrary style transfer and domain generalization,” in CVPR, 2022.
  34. S. Jeon, K. Hong, P. Lee, J. Lee, and H. Byun, “Feature stylization and domain-aware contrastive learning for domain generalization,” in ACM MM, 2021.
  35. J. Kang, S. Lee, N. Kim, and S. Kwak, “Style neophile: Constantly seeking novel styles for domain generalization,” in CVPR, 2022.
  36. Y. Fu, Y. Xie, Y. Fu, and Y.-G. Jiang, “Styleadv: Meta style adversarial training for cross-domain few-shot learning,” in CVPR, 2023.
  37. Q. Xu, R. Zhang, Y. Zhang, Y. Wang, and Q. Tian, “A fourier-based framework for domain generalization,” in CVPR, 2021.
  38. J. Wang, R. Du, D. Chang, K. Liang, and Z. Ma, “Domain generalization via frequency-domain-based feature disentanglement and interaction,” in ACM MM, 2022.
  39. R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness,” in ICLR, 2019.
  40. K. Hermann and A. Lampinen, “What shapes feature representations? exploring datasets, architectures, and training,” in NeurIPS, 2020.
  41. B. Shi, D. Zhang, Q. Dai, Z. Zhu, Y. Mu, and J. Wang, “Informative dropout for robust representation learning: A shape-bias perspective,” in ICML, 2020.
  42. Y. Li, Q. Yu, M. Tan, J. Mei, P. Tang, W. Shen, A. Yuille, and C. Xie, “Shape-texture debiased neural network training,” in ICLR, 2021.
  43. S. Tuli, I. Dasgupta, E. Grant, and T. L. Griffiths, “Are convolutional neural networks or transformers more like human vision?” arXiv preprint arXiv:2105.07197, 2021.
  44. K. Zhou, Y. Yang, Y. Qiao, and T. Xiang, “Domain generalization with mixstyle,” in ICLR, 2020.
  45. H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo, “Reducing domain gap by reducing style bias,” in CVPR, 2021.
  46. M. A. Islam, M. Kowal, P. Esser, S. Jia, B. Ommer, K. G. Derpanis, and N. Bruce, “Shape or texture: Understanding discriminative features in cnns,” in ICLR, 2021.
  47. H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in ICLR, 2018.
  48. S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in ICCV, 2019.
  49. T. Dao, A. Gu, A. Ratner, V. Smith, C. De Sa, and C. Ré, “A kernel theory of modern data augmentation,” in ICML, 2019.
  50. Z. Liu, J. Zhang, Q. He, and C. Wang, “Understanding data augmentation from a robustness perspective,” arXiv preprint arXiv:2311.12800, 2023.
  51. Z. He, L. Xie, X. Chen, Y. Zhang, Y. Wang, and Q. Tian, “Data augmentation revisited: Rethinking the distribution gap between clean and augmented data,” arXiv preprint arXiv:1909.09148, 2019.
  52. Z. Huang, H. Wang, E. P. Xing, and D. Huang, “Self-challenging improves cross-domain generalization,” in ECCV, 2020.
  53. I. Gulrajani and D. Lopez-Paz, “In search of lost domain generalization,” in ICLR, 2020.
  54. D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Deeper, broader and artier domain generalization,” in ICCV, 2017.
  55. H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in CVPR, 2017.
  56. A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in CVPR, 2011.
  57. S. Beery, G. Van Horn, and P. Perona, “Recognition in terra incognita,” in ECCV, 2018.
  58. X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Moment matching for multi-source domain adaptation,” in ICCV, 2019.
  59. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in ICCV, 2021.
  60. D. Kim, K. Wang, S. Sclaroff, and K. Saenko, “A broad study of pre-training for domain generalization and adaptation,” in ECCV, 2022.
  61. C.-H. Liao, W.-C. Chen, H.-T. Liu, Y.-R. Yeh, M.-C. Hu, and C.-S. Chen, “Domain invariant vision transformer learning for face anti-spoofing,” in WACV, 2023.
  62. Y. Shi, J. Seely, P. Torr, N. Siddharth, A. Hannun, N. Usunier, and G. Synnaeve, “Gradient matching for domain generalization,” in ICLR, 2021.
  63. J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y. Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,” in NeurIPS, 2021.
  64. S. Min, N. Park, S. Kim, S. Park, and J. Kim, “Grounding visual representations with texts for domain generalization,” in ECCV, 2022.
  65. J. Cha, K. Lee, S. Park, and S. Chun, “Domain generalization by mutual-information regularization with pre-trained models,” in ECCV, 2022.
  66. C. Eastwood, A. Robey, S. Singh, J. von Kügelgen, H. Hassani, G. J. Pappas, and B. Schölkopf, “Probable domain generalization via quantile risk minimization,” in NeurIPS, 2022.
  67. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in ICML, 2021.
  68. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2021.
  69. M. Sultana, M. Naseer, M. H. Khan, S. Khan, and F. S. Khan, “Self-distilled vision transformer for domain generalization,” in ACCV, 2022.
  70. S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in ECCV, 2016.
  71. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in CVPR, 2016.
  72. F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in CVPR, 2020.
  73. G. Neuhold, T. Ollmann, S. Rota Bulo, and P. Kontschieder, “The mapillary vistas dataset for semantic understanding of street scenes,” in ICCV, 2017.
  74. S. Choi, S. Jung, H. Yun, J. T. Kim, S. Kim, and J. Choo, “Robustnet: Improving domain generalization in urban-scene segmentation via instance selective whitening,” in CVPR, 2021.
  75. L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in ECCV, 2018.
  76. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
  77. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in ICCV, 2017.
  78. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017.

Summary

We haven't generated a summary for this paper yet.