Papers
Topics
Authors
Recent
Search
2000 character limit reached

DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization

Published 5 Mar 2024 in cs.CV | (2403.02714v1)

Abstract: Traditional cross-domain tasks, including domain adaptation and domain generalization, rely heavily on training model by source domain data. With the recent advance of vision-LLMs (VLMs), viewed as natural source models, the cross-domain task changes to directly adapt the pre-trained source model to arbitrary target domains equipped with prior domain knowledge, and we name this task Adaptive Domain Generalization (ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which drives us to establish a novel dataset DomainVerse for ADG. Benefiting from the introduced hierarchical definition of domain shifts, DomainVerse consists of about 0.5 million images from 390 fine-grained realistic domains. With the help of the constructed DomainVerse and VLMs, we propose two methods called Domain CLIP and Domain++ CLIP for tuning-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Vlp: A survey on vision-language pre-training. Machine Intelligence Research, 20(1):38–56, 2023.
  3. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  4. A survey of vision-language pre-trained models. arXiv preprint arXiv:2202.10936, 2022.
  5. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, 2010.
  6. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
  7. Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE conference on computer vision and pattern recognition, pp.  2066–2073. IEEE, 2012.
  8. Calip: Zero-shot enhancement of clip with parameter-free attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  746–754, 2023.
  9. Haas, J. K. A history of the unity game engine. Diss. Worcester Polytechnic Institute, 483(2014):484, 2014.
  10. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  11. Learning how to learn domain-invariant parameters for domain generalization. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  1–5. IEEE, 2023.
  12. Hull, J. J. A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence, 16(5):550–554, 1994.
  13. Unity game development engine: A technical survey. Univ. Sindh J. Inf. Commun. Technol, 4:73–81, 2020.
  14. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pp.  4904–4916. PMLR, 2021.
  15. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  16. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  17. Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pp.  5542–5550, 2017.
  18. Microsoft coco: Common objects in context. In European Conference on Computer Vision, pp.  740–755, 2014.
  19. Visual classification via description from large language models. arXiv preprint arXiv:2210.07183, 2022.
  20. Incorporating prior domain knowledge into deep neural networks. In 2018 IEEE international conference on big data (big data), pp.  36–45. IEEE, 2018.
  21. Reading digits in natural images with unsupervised feature learning. 2011.
  22. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  1406–1415, 2019.
  23. What does a platypus look like? generating customized prompts for zero-shot image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  15691–15701, 2023.
  24. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  25. Adapting visual category models to new domains. In European conference on computer vision, pp.  213–226. Springer, 2010.
  26. Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems, 35:14274–14289, 2022.
  27. Flava: A foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15638–15650, 2022.
  28. Unbiased look at dataset bias. In CVPR 2011, pp.  1521–1528. IEEE, 2011.
  29. Sus-x: Training-free name-only transfer of vision-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  2725–2736, 2023.
  30. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  5018–5027, 2017.
  31. A fourier-based framework for domain generalization. In CVPR, 2021.
  32. Self-supervised graph neural network for multi-source domain adaptation. In Proceedings of the 30th ACM International Conference on Multimedia, pp.  3907–3916, 2022.
  33. Nico++: Towards better benchmarking for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16036–16047, 2023.
  34. Ood-cv: a benchmark for robustness to out-of-distribution shifts of individual nuisances in natural images. In European Conference on Computer Vision, pp.  163–180. Springer, 2022.
  35. Regionclip: Region-based language-image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16793–16803, 2022.
  36. Domain generalization: A survey. arXiv preprint arXiv:2103.02503, 2021a.
  37. Domain adaptive ensemble learning. IEEE Transactions on Image Processing, 30:8008–8018, 2021b.
  38. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16816–16825, 2022a.
  39. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
  40. Mixstyle neural networks for domain generalization and adaptation. International Journal of Computer Vision, pp.  1–15, 2023a.
  41. Distribution normalization: An” effortless” test-time augmentation for contrastively learned visual-language models. arXiv preprint arXiv:2302.11084, 2023b.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.