Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
96 tokens/sec
Gemini 2.5 Pro Premium
44 tokens/sec
GPT-5 Medium
18 tokens/sec
GPT-5 High Premium
18 tokens/sec
GPT-4o
105 tokens/sec
DeepSeek R1 via Azure Premium
83 tokens/sec
GPT OSS 120B via Groq Premium
475 tokens/sec
Kimi K2 via Groq Premium
259 tokens/sec
2000 character limit reached

Domain-Aware Fine-Tuning of Foundation Models (2407.03482v2)

Published 3 Jul 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Foundation models (FMs) have revolutionized computer vision, enabling effective learning across different domains. However, their performance under domain shift is yet underexplored. This paper investigates the zero-shot domain adaptation potential of FMs by comparing different backbone architectures and introducing novel domain-aware components that leverage domain related textual embeddings. We propose domain adaptive normalization, termed as Domino, which explicitly leverages domain embeddings during fine-tuning, thus making the model domain aware. Ultimately, Domino enables more robust computer vision models that can adapt effectively to various unseen domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Synthetic examples improve generalization for rare classes. In WACV, 2020.
  2. Revisiting resnets: Improved training and scaling strategies. NeurIPS, 2021.
  3. Language models are few-shot learners. NeurIPS, 2020.
  4. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
  5. Vision transformers need registers. In ICLR, 2023.
  6. Imagenet: A large-scale hierarchical image database. In CVPR, pp.  248–255, 2009.
  7. Maskclip: Masked self-distillation advances contrastive language-image pretraining. In CVPR, 2023.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  9. The pascal visual object classes challenge: A retrospective. IJCV, 2015.
  10. Poda: Prompt-driven zero-shot domain adaptation. In ICCV, 2023.
  11. A brief review of domain adaptation. Advances in data science and information engineering: proceedings from ICDATA 2020 and IKE 2020, 2021.
  12. A brief survey on semantic segmentation with deep learning. Neurocomputing, 2020.
  13. Deep residual learning for image recognition. In CVPR, 2016.
  14. Prompt-to-prompt image editing with cross-attention control. In ICLR, 2023.
  15. Exploring plain vision transformer backbones for object detection. In ECCV, 2022.
  16. Divide & bind your attention for improved generative semantic nursing. In BMVC, 2023a.
  17. Intra-& extra-source exemplar-based style synthesis for improved domain generalization. IJCV, 2023b.
  18. Adversarial supervision makes layout-to-image diffusion models thrive. In ICLR, 2024.
  19. Deep unsupervised domain adaptation: A review of recent advances and perspectives. APSIPA Transactions on Signal and Information Processing, 2022.
  20. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
  21. Decoupled weight decay regularization. In ICLR, 2018.
  22. Dinov2: Learning robust visual features without supervision. TMLR, 2023.
  23. Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019.
  24. Learning transferable visual models from natural language supervision. In ICML, 2021.
  25. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  26. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
  27. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In ICCV, October 2021.
  28. Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022.
  29. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  30. Exploring clip for assessing the look and feel of images. In AAAI, 2023.
  31. Label-free neural semantic image synthesis. In ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models, 2024.
  32. Unleashing text-to-image diffusion models for visual perception. ICCV, 2023.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

HackerNews