PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization (2404.09011v1)
Abstract: Domain Generalization (DG) aims to resolve distribution shifts between source and target domains, and current DG methods are default to the setting that data from source and target domains share identical categories. Nevertheless, there exists unseen classes from target domains in practical scenarios. To address this issue, Open Set Domain Generalization (OSDG) has emerged and several methods have been exclusively proposed. However, most existing methods adopt complex architectures with slight improvement compared with DG methods. Recently, vision-LLMs (VLMs) have been introduced in DG following the fine-tuning paradigm, but consume huge training overhead with large vision models. Therefore, in this paper, we innovate to transfer knowledge from VLMs to lightweight vision models and improve the robustness by introducing Perturbation Distillation (PD) from three perspectives, including Score, Class and Instance (SCI), named SCI-PD. Moreover, previous methods are oriented by the benchmarks with identical and fixed splits, ignoring the divergence between source domains. These methods are revealed to suffer from sharp performance decay with our proposed new benchmark Hybrid Domain Generalization (HDG) and a novel metric $H{2}$-CV, which construct various splits to comprehensively assess the robustness of algorithms. Extensive experiments demonstrate that our method outperforms state-of-the-art algorithms on multiple datasets, especially improving the robustness when confronting data scarcity.
- On the effectiveness of image rotation for open set domain adaptation. In European conference on computer vision, pages 422–438. Springer, 2020.
- Swad: Domain generalization by seeking flat minima. Advances in Neural Information Processing Systems, 34:22405–22418, 2021.
- Domain generalization by mutual-information regularization with pre-trained models. In European Conference on Computer Vision, pages 440–457. Springer, 2022.
- Activate and reject: Towards safe domain generalization under category shift. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11552–11563, 2023a.
- Adversarial reciprocal points learning for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8065–8081, 2021.
- Instance paradigm contrastive learning for domain generalization. IEEE Transactions on Circuits and Systems for Video Technology, 2023b.
- Promptstyler: Prompt-driven style generation for source-free domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15702–15712, 2023.
- Progressive random convolutions for single domain generalization. arXiv preprint arXiv:2304.00424, 2023.
- Enabling multimodal generation on clip via vision-language knowledge distillation. arXiv preprint arXiv:2203.06386, 2022.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097, 2021.
- Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, pages 1–15, 2023.
- Open-vocabulary object detection via vision and language knowledge distillation. arXiv preprint arXiv:2104.13921, 2021.
- In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020.
- Domaindrop: Suppressing domain-sensitive channels for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19114–19124, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019.
- Self-challenging improves cross-domain generalization. In European Conference on Computer Vision, pages 124–140. Springer, 2020.
- A sentence speaks a thousand images: Domain generalization through distilling clip with language guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11685–11695, 2023.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
- Style neophile: Constantly seeking novel styles for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7130–7140, 2022.
- Open-set domain generalization via metric learning. In 2021 IEEE International Conference on Image Processing (ICIP), pages 459–463. IEEE, 2021.
- Vladimir Koltchinskii. Oracle inequalities in empirical risk minimization and sparse recovery problems: École D’Été de Probabilités de Saint-Flour XXXVIII-2008. Springer Science & Business Media, 2011.
- Cross-domain ensemble distillation for domain generalization. In European Conference on Computer Vision, pages 1–20. Springer, 2022.
- Decompose, adjust, compose: Effective normalization by playing with frequency for domain generalization. arXiv preprint arXiv:2303.02328, 2023.
- Less is more: Clipbert for video-and-language learning via sparse sampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7331–7341, 2021.
- Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pages 5542–5550, 2017.
- Domain generalization with adversarial feature learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5400–5409, 2018.
- Distilling large vision-language model with out-of-distribution generalizability. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2492–2503, 2023.
- Causality inspired representation learning for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8046–8056, 2022.
- Simple domain generalization methods are strong baselines for open domain generalization. arXiv preprint arXiv:2303.18031, 2023.
- Relational knowledge distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3967–3976, 2019.
- Clipping: Distilling clip-based models with a student base for video-language retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18983–18992, 2023.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
- Modality-agnostic debiasing for single domain generalization. arXiv preprint arXiv:2303.07123, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Generalizing across domains via cross-gradient training. arXiv preprint arXiv:1804.10745, 2018.
- Open domain generalization with domain-augmented meta-learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9624–9633, 2021.
- Clipood: Generalizing clip to out-of-distributions. arXiv preprint arXiv:2302.00864, 2023.
- Self-distilled vision transformer for domain generalization. In Proceedings of the Asian Conference on Computer Vision, pages 3068–3085, 2022.
- Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, pages 443–450. Springer, 2016.
- Dime-fm: Distilling multimodal and efficient foundation models. arXiv preprint arXiv:2303.18232, 2023.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11):2579–2625, 2008.
- Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5018–5027, 2017.
- Sharpness-aware gradient matching for domain generalization. arXiv preprint arXiv:2303.10353, 2023a.
- Generalizable decision boundaries: Dualistic meta-learning for open set domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11564–11573, 2023b.
- Embracing the dark knowledge: Domain generalization using regularized knowledge distillation. In Proceedings of the 29th ACM International Conference on Multimedia, pages 2595–2604, 2021.
- Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971, 2022.
- A fourier-based framework for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14383–14392, 2021.
- Cross-image relational knowledge distillation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12319–12328, 2022.
- Regularizing class-wise predictions via self-knowledge distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13876–13885, 2020.
- Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3713–3722, 2019.
- Flatness-aware minimization for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5189–5202, 2023.
- Preventing zero-shot transfer degradation in continual learning of vision-language models. arXiv preprint arXiv:2303.06628, 2023.
- Learning to generate novel domains for domain generalization. In European conference on computer vision, pages 561–578. Springer, 2020.
- Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008, 2021.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
- Crossmatch: Cross-classifier consistency regularization for open-set single domain generalization. In International Conference on Learning Representations, 2021.
- Zining Chen (5 papers)
- Weiqiu Wang (4 papers)
- Zhicheng Zhao (34 papers)
- Fei Su (37 papers)
- Aidong Men (22 papers)
- Hongying Meng (10 papers)