DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization
Abstract: Traditional cross-domain tasks, including domain adaptation and domain generalization, rely heavily on training model by source domain data. With the recent advance of vision-LLMs (VLMs), viewed as natural source models, the cross-domain task changes to directly adapt the pre-trained source model to arbitrary target domains equipped with prior domain knowledge, and we name this task Adaptive Domain Generalization (ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which drives us to establish a novel dataset DomainVerse for ADG. Benefiting from the introduced hierarchical definition of domain shifts, DomainVerse consists of about 0.5 million images from 390 fine-grained realistic domains. With the help of the constructed DomainVerse and VLMs, we propose two methods called Domain CLIP and Domain++ CLIP for tuning-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Vlp: A survey on vision-language pre-training. Machine Intelligence Research, 20(1):38–56, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- A survey of vision-language pre-trained models. arXiv preprint arXiv:2202.10936, 2022.
- The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, 2010.
- Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
- Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE conference on computer vision and pattern recognition, pp. 2066–2073. IEEE, 2012.
- Calip: Zero-shot enhancement of clip with parameter-free attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 746–754, 2023.
- Haas, J. K. A history of the unity game engine. Diss. Worcester Polytechnic Institute, 483(2014):484, 2014.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Learning how to learn domain-invariant parameters for domain generalization. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE, 2023.
- Hull, J. J. A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence, 16(5):550–554, 1994.
- Unity game development engine: A technical survey. Univ. Sindh J. Inf. Commun. Technol, 4:73–81, 2020.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pp. 4904–4916. PMLR, 2021.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pp. 5542–5550, 2017.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision, pp. 740–755, 2014.
- Visual classification via description from large language models. arXiv preprint arXiv:2210.07183, 2022.
- Incorporating prior domain knowledge into deep neural networks. In 2018 IEEE international conference on big data (big data), pp. 36–45. IEEE, 2018.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1406–1415, 2019.
- What does a platypus look like? generating customized prompts for zero-shot image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15691–15701, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Adapting visual category models to new domains. In European conference on computer vision, pp. 213–226. Springer, 2010.
- Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems, 35:14274–14289, 2022.
- Flava: A foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15638–15650, 2022.
- Unbiased look at dataset bias. In CVPR 2011, pp. 1521–1528. IEEE, 2011.
- Sus-x: Training-free name-only transfer of vision-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2725–2736, 2023.
- Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5018–5027, 2017.
- A fourier-based framework for domain generalization. In CVPR, 2021.
- Self-supervised graph neural network for multi-source domain adaptation. In Proceedings of the 30th ACM International Conference on Multimedia, pp. 3907–3916, 2022.
- Nico++: Towards better benchmarking for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16036–16047, 2023.
- Ood-cv: a benchmark for robustness to out-of-distribution shifts of individual nuisances in natural images. In European Conference on Computer Vision, pp. 163–180. Springer, 2022.
- Regionclip: Region-based language-image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16793–16803, 2022.
- Domain generalization: A survey. arXiv preprint arXiv:2103.02503, 2021a.
- Domain adaptive ensemble learning. IEEE Transactions on Image Processing, 30:8008–8018, 2021b.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816–16825, 2022a.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
- Mixstyle neural networks for domain generalization and adaptation. International Journal of Computer Vision, pp. 1–15, 2023a.
- Distribution normalization: An” effortless” test-time augmentation for contrastively learned visual-language models. arXiv preprint arXiv:2302.11084, 2023b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.