Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization (2401.09716v1)

Published 18 Jan 2024 in cs.CV and cs.AI

Abstract: Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. In DG, the prevalent practice of constraining models to a fixed structure or uniform parameterization to encapsulate invariant features can inadvertently blend specific aspects. Such an approach struggles with nuanced differentiation of inter-domain variations and may exhibit bias towards certain domains, hindering the precise learning of domain-invariant features. Recognizing this, we introduce a novel method designed to supplement the model with domain-level and task-specific characteristics. This approach aims to guide the model in more effectively separating invariant features from specific characteristics, thereby boosting the generalization. Building on the emerging trend of visual prompts in the DG paradigm, our work introduces the novel \textbf{H}ierarchical \textbf{C}ontrastive \textbf{V}isual \textbf{P}rompt (HCVP) methodology. This represents a significant advancement in the field, setting itself apart with a unique generative approach to prompts, alongside an explicit model structure and specialized loss functions. Differing from traditional visual prompts that are often shared across entire datasets, HCVP utilizes a hierarchical prompt generation network enhanced by prompt contrastive learning. These generative prompts are instance-dependent, catering to the unique characteristics inherent to different domains and tasks. Additionally, we devise a prompt modulation network that serves as a bridge, effectively incorporating the generated visual prompts into the vision transformer backbone. Experiments conducted on five DG datasets demonstrate the effectiveness of HCVP, outperforming both established DG algorithms and adaptation protocols.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. J. Wang, C. Lan, C. Liu, Y. Ouyang, T. Qin, W. Lu, Y. Chen, W. Zeng, and P. Yu, “Generalizing to unseen domains: A survey on domain generalization,” IEEE Transactions on Knowledge and Data Engineering, 2022.
  2. K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  3. G. Zhou, S. Xie, G. Hao, S. Chen, B. Huang, X. Xu, C. Wang, L. Zhu, L. Yao, and K. Zhang, “Emerging synergies in causality and deep generative models: A survey,” 2023.
  4. I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  5. V. Piratla, P. Netrapalli, and S. Sarawagi, “Efficient domain generalization via common-specific low-rank decomposition,” in International Conference on Machine Learning.   PMLR, 2020, pp. 7728–7738.
  6. R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,” Nature Machine Intelligence, vol. 2, no. 11, pp. 665–673, 2020.
  7. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” The journal of machine learning research, vol. 17, no. 1, pp. 2096–2030, 2016.
  8. M. Rojas-Carulla, B. Schölkopf, R. Turner, and J. Peters, “Invariant models for causal transfer learning,” The Journal of Machine Learning Research, vol. 19, no. 1, pp. 1309–1342, 2018.
  9. X. Sun, B. Wu, X. Zheng, C. Liu, W. Chen, T. Qin, and T.-Y. Liu, “Recovering latent causal factor for generalization to distributional shifts,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 846–16 859, 2021.
  10. R. Christiansen, N. Pfister, M. E. Jakobsen, N. Gnecco, and J. Peters, “A causal framework for distribution generalization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6614–6630, 2021.
  11. Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and D. Tao, “Deep domain generalization via conditional invariant adversarial networks,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 624–639.
  12. X. Wang, M. Saxon, J. Li, H. Zhang, K. Zhang, and W. Y. Wang, “Causal balancing for domain generalization,” in The Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=F91SROvVJ_6
  13. I. Gulrajani and D. Lopez-Paz, “In search of lost domain generalization,” in International Conference on Learning Representations, 2020.
  14. O. Wiles, S. Gowal, F. Stimberg, S. Alvise-Rebuffi, I. Ktena, K. Dvijotham, and T. Cemgil, “A fine-grained analysis on distribution shift,” arXiv preprint arXiv:2110.11328, 2021.
  15. R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
  16. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  17. C. Zhang, M. Zhang, S. Zhang, D. Jin, Q. Zhou, Z. Cai, H. Zhao, X. Liu, and Z. Liu, “Delving deep into the generalization of vision transformers under distribution shifts,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2022, pp. 7277–7286.
  18. Y. Chen, T. Hu, F. Zhou, Z. Li, and Z.-M. Ma, “Explore and exploit the diverse knowledge in model zoo for domain generalization,” in International Conference on Machine Learning.   PMLR, 2023, pp. 4623–4640.
  19. A. Kumar, A. Raghunathan, R. Jones, T. Ma, and P. Liang, “Fine-tuning can distort pretrained features and underperform out-of-distribution,” arXiv preprint arXiv:2202.10054, 2022.
  20. B. Li, Y. Shen, J. Yang, Y. Wang, J. Ren, T. Che, J. Zhang, and Z. Liu, “Sparse mixture-of-experts are domain generalizable learners,” arXiv preprint arXiv:2206.04046, 2022.
  21. A. Ramé, K. Ahuja, J. Zhang, M. Cord, L. Bottou, and D. Lopez-Paz, “Model ratatouille: Recycling diverse models for out-of-distribution generalization,” 2023.
  22. S. S. Du, W. Hu, S. M. Kakade, J. D. Lee, and Q. Lei, “Few-shot learning via learning the representation, provably,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=pW2Q2xLwIMD
  23. Y. Lee, A. S. Chen, F. Tajwar, A. Kumar, H. Yao, P. Liang, and C. Finn, “Surgical fine-tuning improves adaptation to distribution shifts,” in The Eleventh International Conference on Learning Representations, 2022.
  24. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning.   PMLR, 2021, pp. 8748–8763.
  25. K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,” International Journal of Computer Vision, vol. 130, no. 9, pp. 2337–2348, 2022.
  26. M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” in European Conference on Computer Vision.   Springer, 2022, pp. 709–727.
  27. Z. Zheng, X. Yue, K. Wang, and Y. You, “Prompt vision transformer for domain generalization,” arXiv preprint arXiv:2208.08914, 2022.
  28. A. Li, L. Zhuang, S. Fan, and S. Wang, “Learning common and specific visual prompts for domain generalization,” in Proceedings of the Asian Conference on Computer Vision, 2022, pp. 4260–4275.
  29. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning.   PMLR, 2020, pp. 1597–1607.
  30. G. Zhou, C. Huang, X. Chen, X. Xu, C. Wang, L. Zhu, and L. Yao, “Contrastive counterfactual learning for causality-aware interpretable recommender systems,” 2023.
  31. X. Li and L. Yao, “Contrastive individual treatment effects estimation,” in 2022 IEEE International Conference on Data Mining (ICDM).   IEEE, 2022, pp. 1053–1058.
  32. N. Ye, K. Li, H. Bai, R. Yu, L. Hong, F. Zhou, Z. Li, and J. Zhu, “Ood-bench: Quantifying and understanding two dimensions of out-of-distribution generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7947–7958.
  33. X. Yue, Y. Zhang, S. Zhao, A. Sangiovanni-Vincentelli, K. Keutzer, and B. Gong, “Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2100–2110.
  34. F. Qiao, L. Zhao, and X. Peng, “Learning to learn single domain generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 556–12 565.
  35. S.-Y. Jo and S. W. Yoon, “Poem: Polarization of embeddings for domain-invariant representations,” arXiv preprint arXiv:2305.13046, 2023.
  36. J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y. Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,” Advances in Neural Information Processing Systems, vol. 34, pp. 22 405–22 418, 2021.
  37. J. Cha, K. Lee, S. Park, and S. Chun, “Domain generalization by mutual-information regularization with pre-trained models,” in European Conference on Computer Vision.   Springer, 2022, pp. 440–457.
  38. Y. Liu, Z. Xiong, Y. Li, X. Tian, and Z.-J. Zha, “Domain generalization via encoding and resampling in a unified latent space,” IEEE Transactions on Multimedia, 2021.
  39. X. Jin, C. Lan, W. Zeng, and Z. Chen, “Style normalization and restitution for domain generalization and adaptation,” IEEE Transactions on Multimedia, vol. 24, pp. 3636–3651, 2021.
  40. X. Ma, T. Zhang, and C. Xu, “Deep multi-modality adversarial networks for unsupervised domain adaptation,” IEEE Transactions on Multimedia, vol. 21, no. 9, pp. 2419–2431, 2019.
  41. Y. Luo, G. Kang, K. Liu, F. Zhuang, and J. Lü, “Taking a closer look at factor disentanglement: Dual-path variational autoencoder learning for domain generalization,” IEEE Transactions on Multimedia, 2023.
  42. Z. Niu, J. Yuan, X. Ma, Y. Xu, J. Liu, Y.-W. Chen, R. Tong, and L. Lin, “Knowledge distillation-based domain-invariant representation learning for domain generalization,” IEEE Transactions on Multimedia, 2023.
  43. H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5400–5409.
  44. Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in International conference on machine learning.   PMLR, 2015, pp. 1180–1189.
  45. S. Wu, H. R. Zhang, and C. Ré, “Understanding and improving information transfer in multi-task learning,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SylzhkBtDB
  46. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009.
  47. X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190, 2021.
  48. X. Liu, K. Ji, Y. Fu, Z. Du, Z. Yang, and J. Tang, “P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks,” ArXiv, vol. abs/2110.07602, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:238857040
  49. A. Chen, Y. Yao, P.-Y. Chen, Y. Zhang, and S. Liu, “Understanding and improving visual prompting: A label-mapping perspective,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 133–19 143.
  50. K. Sohn, Y. Hao, J. Lezama, L. F. Polanía, H. Chang, H. Zhang, I. Essa, and L. Jiang, “Visual prompt tuning for generative transfer learning,” ArXiv, vol. abs/2210.00990, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:252683603
  51. R. Das, Y. Dukler, A. Ravichandran, and A. Swaminathan, “Learning expressive prompting with residuals for vision transformers,” ArXiv, vol. abs/2303.15591, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257771595
  52. N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.
  53. N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015 ieee information theory workshop (itw).   IEEE, 2015, pp. 1–5.
  54. B. Li, Y. Shen, Y. Wang, W. Zhu, D. Li, K. Keutzer, and H. Zhao, “Invariant information bottleneck for domain generalization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, 2022, pp. 7399–7407.
  55. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  56. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
  57. D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” International Conference on Learning Representations (ICLR), 2015.
  58. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  59. D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Deeper, broader and artier domain generalization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5542–5550.
  60. C. Fang, Y. Xu, and D. N. Rockmore, “Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1657–1664.
  61. H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5018–5027.
  62. S. Beery, G. Van Horn, and P. Perona, “Recognition in terra incognita,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 456–473.
  63. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3730–3738.
  64. S. Sagawa, P. W. Koh, T. B. Hashimoto, and P. Liang, “Distributionally robust neural networks,” in International Conference on Learning Representations, 2019.
  65. M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant risk minimization,” arXiv preprint arXiv:1907.02893, 2019.
  66. D. Li, Y. Yang, Y.-Z. Song, and T. Hospedales, “Learning to generalize: Meta-learning for domain generalization,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018.
  67. D. Krueger, E. Caballero, J.-H. Jacobsen, A. Zhang, J. Binas, D. Zhang, R. Le Priol, and A. Courville, “Out-of-distribution generalization via risk extrapolation (rex),” in International Conference on Machine Learning.   PMLR, 2021, pp. 5815–5826.
  68. G. Blanchard, A. A. Deshmukh, U. Dogan, G. Lee, and C. Scott, “Domain generalization by marginal transfer learning,” Journal of machine learning research, vol. 22, no. 2, pp. 1–55, 2021.
  69. H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo, “Reducing domain gap via style-agnostic networks,” arXiv preprint arXiv:1910.11645, vol. 2, no. 7, p. 8, 2019.
  70. Z. Huang, H. Wang, E. P. Xing, and D. Huang, “Self-challenging improves cross-domain generalization,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16.   Springer, 2020, pp. 124–140.
  71. K. Ahuja, E. Caballero, D. Zhang, J.-C. Gagnon-Audet, Y. Bengio, I. Mitliagkas, and I. Rish, “Invariance principle meets information bottleneck for out-of-distribution generalization,” Advances in Neural Information Processing Systems, vol. 34, pp. 3438–3450, 2021.
  72. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition.   Ieee, 2009, pp. 248–255.
  73. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  74. S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Machine learning, vol. 79, pp. 151–175, 2010.
  75. L. van der Maaten and G. E. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008. [Online]. Available: https://api.semanticscholar.org/CorpusID:5855042
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Guanglin Zhou (9 papers)
  2. Zhongyi Han (17 papers)
  3. Shiming Chen (29 papers)
  4. Biwei Huang (54 papers)
  5. Liming Zhu (101 papers)
  6. Tongliang Liu (251 papers)
  7. Lina Yao (194 papers)
  8. Kun Zhang (353 papers)
Citations (1)