A Simple Recipe for Language-guided Domain Generalized Segmentation (2311.17922v2)
Abstract: Generalization to new domains not seen during training is one of the long-standing challenges in deploying neural networks in real-world applications. Existing generalization techniques either necessitate external images for augmentation, and/or aim at learning invariant representations by imposing various alignment constraints. Large-scale pretraining has recently shown promising generalization capabilities, along with the potential of binding different modalities. For instance, the advent of vision-LLMs like CLIP has opened the doorway for vision models to exploit the textual modality. In this paper, we introduce a simple framework for generalizing semantic segmentation networks by employing language as the source of randomization. Our recipe comprises three key ingredients: (i) the preservation of the intrinsic CLIP robustness through minimal fine-tuning, (ii) language-driven local style augmentation, and (iii) randomization by locally mixing the source and augmented styles during training. Extensive experiments report state-of-the-art results on various generalization benchmarks. Code is accessible at https://github.com/astra-vision/FAMix .
- Invariance principle meets information bottleneck for out-of-distribution generalization. In NeurIPS, 2021.
- A simple zero-shot prompt weighting technique to improve prompt ensembling in text-image models. In ICML, 2023.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Metareg: Towards domain generalization using meta-regularization. In NeurIPS, 2018.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
- Exploring simple siamese representation learning. In CVPR, 2021.
- Robustnet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In CVPR, 2021.
- The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Poda: Prompt-driven zero-shot domain adaptation. In ICCV, 2023.
- Towards robust object detection invariant to real-world domain shifts. In ICLR, 2023.
- Data determines distributional robustness in contrastive language image pre-training (clip). In ICML, 2022.
- Domain-adversarial training of neural networks. JMLR, 2016.
- Improving zero-shot generalization and robustness of multi-modal models. In CVPR, 2023.
- Finetune like you pretrain: Improved finetuning of zero-shot vision models. In CVPR, 2023.
- Open-vocabulary object detection via vision and language knowledge distillation. In ICLR, 2022.
- Physics-based rendering for improving robustness to rain. In ICCV, 2019.
- Deep residual learning for image recognition. In CVPR, 2016.
- Cycada: Cycle-consistent adversarial domain adaptation. In ICML, 2018.
- Style projected clustering for domain generalized semantic segmentation. In CVPR, 2023.
- Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, 2017.
- Efficiently robustify pre-trained models. In ICCV, 2023.
- Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, 2021.
- Pin the memory: Learning to generalize semantic segmentation. In CVPR, 2022.
- Texture learning domain randomization for domain generalized segmentation. In ICCV, 2023.
- Out-of-distribution generalization via risk extrapolation (rex). In ICML, 2021.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. In ICLR, 2022.
- Clipstyler: Image style transfer with a single text condition. In CVPR, 2022.
- Improving clip robustness with knowledge distillation and self-training. arXiv preprint arXiv:2309.10361, 2023.
- Wildnet: Learning domain generalized semantic segmentation from the wild. In CVPR, 2022.
- Zero-shot day-night domain adaptation with a physics prior. In ICCV, 2021.
- Language-driven semantic segmentation. In ICLR, 2022.
- Domain generalization with adversarial feature learning. In CVPR, 2018a.
- Deep domain generalization via conditional invariant adversarial networks. In ECCV, 2018b.
- Bidirectional learning for domain adaptation of semantic segmentation. In CVPR, 2019.
- Conditional adversarial domain adaptation. In NeurIPS, 2018.
- Decoupled weight decay regularization. In ICLR, 2019.
- The mapillary vistas dataset for semantic understanding of street scenes. In ICCV, 2017.
- Two at once: Enhancing learning and generalization capacities via ibn-net. In ECCV, 2018.
- Semantic-aware domain generalized segmentation. In CVPR, 2022.
- Learning to learn single domain generalization. In CVPR, 2020.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Denseclip: Language-guided dense prediction with context-aware prompting. In CVPR, 2022.
- Playing for data: Ground truth from computer games. In ECCV, 2016.
- The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR, 2016.
- Mind the backbone: Minimizing backbone distortion for robust object detection. arXiv preprint arXiv:2303.14744, 2023.
- ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In ICCV, 2021.
- Clipood: Generalizing clip to out-of-distributions. In ICML, 2023.
- Rethinking the inception architecture for computer vision. In CVPR, 2016.
- Adversarial discriminative domain adaptation. In CVPR, 2017.
- Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In CVPR, 2019.
- Generalizing to unseen domains: A survey on domain generalization. T-KDE, 2022.
- Robust fine-tuning of zero-shot models. In CVPR, 2022.
- Siamdoge: Domain generalizable semantic segmentation using siamese network. In ECCV, 2022.
- A fourier-based framework for domain generalization. In CVPR, 2021.
- Generalized semantic segmentation by self-supervised source domain projection and multi-level contrastive learning. In AAAI, 2023.
- Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In CVPR, 2020.
- LiT: Zero-shot transfer with locked-image text tuning. In CVPR, 2022.
- Sigmoid loss for language image pre-training. In ICCV, 2023.
- Domain generalization via entropy regularization. In NeurIPS, 2020.
- Style-hallucinated dual consistency learning for domain generalized semantic segmentation. In ECCV, 2022.
- Semantic understanding of scenes through the ade20k dataset. IJCV, 2019.
- Extract free dense labels from clip. In ECCV, 2022a.
- Deep domain-adversarial image generation for domain generalisation. In AAAI, 2020a.
- Learning to generate novel domains for domain generalization. In ECCV, 2020b.
- Domain generalization with mixstyle. In ICLR, 2021.
- Domain generalization: A survey. TPAMI, 2022b.
- Conditional prompt learning for vision-language models. In CVPR, 2022c.
- Learning to prompt for vision-language models. IJCV, 2022d.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.