Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization (2404.00710v1)

Published 31 Mar 2024 in cs.CV

Abstract: We delve into Open Domain Generalization (ODG), marked by domain and category shifts between training's labeled source and testing's unlabeled target domains. Existing solutions to ODG face limitations due to constrained generalizations of traditional CNN backbones and errors in detecting target open samples in the absence of prior knowledge. Addressing these pitfalls, we introduce ODG-CLIP, harnessing the semantic prowess of the vision-LLM, CLIP. Our framework brings forth three primary innovations: Firstly, distinct from prevailing paradigms, we conceptualize ODG as a multi-class classification challenge encompassing both known and novel categories. Central to our approach is modeling a unique prompt tailored for detecting unknown class samples, and to train this, we employ a readily accessible stable diffusion model, elegantly generating proxy images for the open class. Secondly, aiming for domain-tailored classification (prompt) weights while ensuring a balance of precision and simplicity, we devise a novel visual stylecentric prompt learning mechanism. Finally, we infuse images with class-discriminative knowledge derived from the prompt space to augment the fidelity of CLIP's visual embeddings. We introduce a novel objective to safeguard the continuity of this infused semantic intel across domains, especially for the shared classes. Through rigorous testing on diverse datasets, covering closed and open-set DG contexts, ODG-CLIP demonstrates clear supremacy, consistently outpacing peers with performance boosts between 8%-16%. Code will be available at https://github.com/mainaksingha01/ODG-CLIP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (98)
  1. Ensemble of averages: Improving model selection and boosting performance in domain generalization. Advances in Neural Information Processing Systems, 35:8265–8277, 2022.
  2. Towards open set deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1563–1572, 2016.
  3. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  4. Stylip: Multi-scale style-conditioned prompt learning for clip-based domain generalization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5542–5552, 2024.
  5. Food-101–mining discriminative components with random forests. In European conference on computer vision, pages 446–461. Springer, 2014.
  6. On the effectiveness of image rotation for open set domain adaptation. In European conference on computer vision, pages 422–438. Springer, 2020.
  7. Lasp: Text-to-text optimization for language-aware soft prompting of vision & language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23232–23241, 2023.
  8. Domain generalization by solving jigsaw puzzles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2229–2238, 2019.
  9. Swad: Domain generalization by seeking flat minima. Advances in Neural Information Processing Systems, 34:22405–22418, 2021.
  10. Adversarial reciprocal points learning for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8065–8081, 2021.
  11. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215–223. JMLR Workshop and Conference Proceedings, 2011.
  12. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  14. The fréchet distance between multivariate normal distributions. Journal of multivariate analysis, 12(3):450–455, 1982.
  15. Vos: Learning what you don’t know by virtual outlier synthesis. arXiv preprint arXiv:2202.01197, 2022.
  16. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.
  17. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2013.
  18. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE, 2004.
  19. Exploring the limits of out-of-distribution detection. Advances in Neural Information Processing Systems, 34:7068–7081, 2021.
  20. Unsupervised domain adaptation by backpropagation. In International conference on machine learning, pages 1180–1189. PMLR, 2015.
  21. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
  22. Clip-adapter: Better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544, 2021.
  23. Generative openmax for multi-class open set classification. arXiv preprint arXiv:1707.07418, 2017.
  24. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  25. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
  26. Olivier Henaff. Data-efficient image recognition with contrastive predictive coding. In International conference on machine learning, pages 4182–4192. PMLR, 2020.
  27. Dandelionnet: Domain composition with instance adaptive classification for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19050–19059, 2023.
  28. Self-challenging improves cross-domain generalization. In European Conference on Computer Vision, pages 124–140. Springer, 2020.
  29. Fine-grained generalized zero-shot learning via dense attribute-based attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4483–4493, 2020.
  30. Visual prompt tuning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, pages 709–727. Springer, 2022.
  31. Style neophile: Constantly seeking novel styles for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7130–7140, 2022.
  32. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19113–19122, June 2023.
  33. Self-regulating prompts: Foundational model adaptation without forgetting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15190–15200, 2023.
  34. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  35. Opengan: Open-set recognition via open data generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 813–822, 2021.
  36. Towards inheritable models for open-set domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12376–12385, 2020.
  37. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  38. J Devlin M Chang K Lee and K Toutanova. Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  39. Learning common and specific visual prompts for domain generalization. In Proceedings of the Asian Conference on Computer Vision, pages 4260–4275, 2022.
  40. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  41. Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pages 5542–5550, 2017.
  42. Episodic training for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1446–1455, 2019.
  43. Domain generalization for medical imaging classification with linear-dependency regularization. Advances in Neural Information Processing Systems, 33:3118–3129, 2020.
  44. Maximum density divergence for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3918–3930, 2021.
  45. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557, 2019.
  46. A simple feature augmentation for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8886–8895, 2021.
  47. Feature-critic networks for heterogeneous domain generalization. In International Conference on Machine Learning, pages 3915–3924. PMLR, 2019.
  48. Prompt distribution learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5206–5215, 2022.
  49. Variational prompt tuning improves generalization of vision-language models. arXiv e-prints, pages arXiv–2210, 2022.
  50. Towards recognizing unseen categories in unseen domains. In European Conference on Computer Vision, pages 466–483. Springer, 2020.
  51. Delving into out-of-distribution detection with vision-language representations. Advances in Neural Information Processing Systems, 35:35087–35102, 2022.
  52. Open set learning with counterfactual images. In Proceedings of the European Conference on Computer Vision (ECCV), pages 613–628, 2018.
  53. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2011, page 7. Granada, Spain, 2011.
  54. Domain-unified prompt representations for source-free domain generalization. arXiv preprint arXiv:2209.14926, 2022.
  55. Jonas Oppenlaender. The creativity of text-to-image generation. In Proceedings of the 25th International Academic Mindtrek Conference, pages 192–202, 2022.
  56. Morgan: Meta-learning-based few-shot open-set recognition via generative adversarial network. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 6295–6304, January 2023.
  57. Open set domain adaptation. In Proceedings of the IEEE international conference on computer vision, pages 754–763, 2017.
  58. Learning to learn, from transfer learning to domain adaptation: A unifying perspective. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1442–1449, 2014.
  59. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
  60. Visda: The visual domain adaptation challenge. arXiv preprint arXiv:1710.06924, 2017.
  61. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  62. Improving language understanding by generative pre-training. https://www.mikecaptain.com/resources/pdf/GPT-1.pdf, 2018.
  63. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  64. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  65. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  66. Labelme: a database and web-based tool for image annotation. International journal of computer vision, 77(1):157–173, 2008.
  67. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226. Springer, 2010.
  68. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  69. Open set domain adaptation by backpropagation. In Proceedings of the European conference on computer vision (ECCV), pages 153–168, 2018.
  70. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  71. Test-time prompt tuning for zero-shot generalization in vision-language models. arXiv preprint arXiv:2209.07511, 2022.
  72. Open domain generalization with domain-augmented meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9624–9633, 2021.
  73. Gopro: Generate and optimize prompts in clip using self-supervised learning, 2023.
  74. Applenet: Visual attention parameterized prompt learning for few-shot remote sensing image generalization using clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2023–2033, June 2023.
  75. Ad-clip: Adapting domains in prompt space using clip. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4355–4364, 2023.
  76. Ad-clip: Adapting domains in prompt space using clip. arXiv preprint arXiv:2308.05659, 2023.
  77. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  78. Open-set recognition: A good closed-set classifier is all you need? arXiv preprint arXiv:2110.06207, 2021.
  79. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5018–5027, 2017.
  80. Manifold mixup: Better representations by interpolating hidden states. In International conference on machine learning, pages 6438–6447. PMLR, 2019.
  81. Clipn for zero-shot ood detection: Teaching clip to say no. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1802–1812, 2023.
  82. Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering, 2022.
  83. Generalizable decision boundaries: Dualistic meta-learning for open set domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11564–11573, 2023.
  84. Heterogeneous domain generalization via domain mixup. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3622–3626. IEEE, 2020.
  85. Respecting domain relations: Hypothesis invariance for domain generalization. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 9756–9763. IEEE, 2021.
  86. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010.
  87. Exploiting low-rank structure from latent domains for domain generalization. In European Conference on Computer Vision, pages 628–643. Springer, 2014.
  88. One ring to bring them all: Towards open-set recognition under domain shift. arXiv preprint arXiv:2206.03600, 2022.
  89. Classification-reconstruction learning for open-set recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4016–4025, 2019.
  90. Towards principled disentanglement for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8024–8034, 2022.
  91. Amortized prompt: Lightweight fine-tuning for clip in domain generalization. arXiv preprint arXiv:2111.12853, 2021.
  92. Learning placeholders for open-set recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2021.
  93. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022.
  94. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
  95. Learning to generate novel domains for domain generalization. In European conference on computer vision, pages 561–578. Springer, 2020.
  96. Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008, 2021.
  97. Prompt-aligned gradient for prompt tuning. arXiv preprint arXiv:2205.14865, 2022.
  98. Crossmatch: Cross-classifier consistency regularization for open-set single domain generalization. In International Conference on Learning Representations, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mainak Singha (20 papers)
  2. Ankit Jha (19 papers)
  3. Shirsha Bose (6 papers)
  4. Ashwin Nair (4 papers)
  5. Moloud Abdar (17 papers)
  6. Biplab Banerjee (63 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.