2000 character limit reached
Domain-Aware Fine-Tuning of Foundation Models (2407.03482v2)
Published 3 Jul 2024 in cs.CV, cs.AI, and cs.LG
Abstract: Foundation models (FMs) have revolutionized computer vision, enabling effective learning across different domains. However, their performance under domain shift is yet underexplored. This paper investigates the zero-shot domain adaptation potential of FMs by comparing different backbone architectures and introducing novel domain-aware components that leverage domain related textual embeddings. We propose domain adaptive normalization, termed as Domino, which explicitly leverages domain embeddings during fine-tuning, thus making the model domain aware. Ultimately, Domino enables more robust computer vision models that can adapt effectively to various unseen domains.
- Synthetic examples improve generalization for rare classes. In WACV, 2020.
- Revisiting resnets: Improved training and scaling strategies. NeurIPS, 2021.
- Language models are few-shot learners. NeurIPS, 2020.
- The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
- Vision transformers need registers. In ICLR, 2023.
- Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255, 2009.
- Maskclip: Masked self-distillation advances contrastive language-image pretraining. In CVPR, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- The pascal visual object classes challenge: A retrospective. IJCV, 2015.
- Poda: Prompt-driven zero-shot domain adaptation. In ICCV, 2023.
- A brief review of domain adaptation. Advances in data science and information engineering: proceedings from ICDATA 2020 and IKE 2020, 2021.
- A brief survey on semantic segmentation with deep learning. Neurocomputing, 2020.
- Deep residual learning for image recognition. In CVPR, 2016.
- Prompt-to-prompt image editing with cross-attention control. In ICLR, 2023.
- Exploring plain vision transformer backbones for object detection. In ECCV, 2022.
- Divide & bind your attention for improved generative semantic nursing. In BMVC, 2023a.
- Intra-& extra-source exemplar-based style synthesis for improved domain generalization. IJCV, 2023b.
- Adversarial supervision makes layout-to-image diffusion models thrive. In ICLR, 2024.
- Deep unsupervised domain adaptation: A review of recent advances and perspectives. APSIPA Transactions on Signal and Information Processing, 2022.
- Fully convolutional networks for semantic segmentation. In CVPR, 2015.
- Decoupled weight decay regularization. In ICLR, 2018.
- Dinov2: Learning robust visual features without supervision. TMLR, 2023.
- Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
- ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In ICCV, October 2021.
- Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Exploring clip for assessing the look and feel of images. In AAAI, 2023.
- Label-free neural semantic image synthesis. In ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models, 2024.
- Unleashing text-to-image diffusion models for visual perception. ICCV, 2023.