Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 67 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 120 tok/s Pro
Kimi K2 166 tok/s Pro
GPT OSS 120B 446 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

A Simple Recipe for Language-guided Domain Generalized Segmentation (2311.17922v2)

Published 29 Nov 2023 in cs.CV

Abstract: Generalization to new domains not seen during training is one of the long-standing challenges in deploying neural networks in real-world applications. Existing generalization techniques either necessitate external images for augmentation, and/or aim at learning invariant representations by imposing various alignment constraints. Large-scale pretraining has recently shown promising generalization capabilities, along with the potential of binding different modalities. For instance, the advent of vision-LLMs like CLIP has opened the doorway for vision models to exploit the textual modality. In this paper, we introduce a simple framework for generalizing semantic segmentation networks by employing language as the source of randomization. Our recipe comprises three key ingredients: (i) the preservation of the intrinsic CLIP robustness through minimal fine-tuning, (ii) language-driven local style augmentation, and (iii) randomization by locally mixing the source and augmented styles during training. Extensive experiments report state-of-the-art results on various generalization benchmarks. Code is accessible at https://github.com/astra-vision/FAMix .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Invariance principle meets information bottleneck for out-of-distribution generalization. In NeurIPS, 2021.
  2. A simple zero-shot prompt weighting technique to improve prompt ensembling in text-image models. In ICML, 2023.
  3. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  4. Metareg: Towards domain generalization using meta-regularization. In NeurIPS, 2018.
  5. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
  6. Exploring simple siamese representation learning. In CVPR, 2021.
  7. Robustnet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In CVPR, 2021.
  8. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
  9. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  11. Poda: Prompt-driven zero-shot domain adaptation. In ICCV, 2023.
  12. Towards robust object detection invariant to real-world domain shifts. In ICLR, 2023.
  13. Data determines distributional robustness in contrastive language image pre-training (clip). In ICML, 2022.
  14. Domain-adversarial training of neural networks. JMLR, 2016.
  15. Improving zero-shot generalization and robustness of multi-modal models. In CVPR, 2023.
  16. Finetune like you pretrain: Improved finetuning of zero-shot vision models. In CVPR, 2023.
  17. Open-vocabulary object detection via vision and language knowledge distillation. In ICLR, 2022.
  18. Physics-based rendering for improving robustness to rain. In ICCV, 2019.
  19. Deep residual learning for image recognition. In CVPR, 2016.
  20. Cycada: Cycle-consistent adversarial domain adaptation. In ICML, 2018.
  21. Style projected clustering for domain generalized semantic segmentation. In CVPR, 2023.
  22. Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, 2017.
  23. Efficiently robustify pre-trained models. In ICCV, 2023.
  24. Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, 2021.
  25. Pin the memory: Learning to generalize semantic segmentation. In CVPR, 2022.
  26. Texture learning domain randomization for domain generalized segmentation. In ICCV, 2023.
  27. Out-of-distribution generalization via risk extrapolation (rex). In ICML, 2021.
  28. Fine-tuning can distort pretrained features and underperform out-of-distribution. In ICLR, 2022.
  29. Clipstyler: Image style transfer with a single text condition. In CVPR, 2022.
  30. Improving clip robustness with knowledge distillation and self-training. arXiv preprint arXiv:2309.10361, 2023.
  31. Wildnet: Learning domain generalized semantic segmentation from the wild. In CVPR, 2022.
  32. Zero-shot day-night domain adaptation with a physics prior. In ICCV, 2021.
  33. Language-driven semantic segmentation. In ICLR, 2022.
  34. Domain generalization with adversarial feature learning. In CVPR, 2018a.
  35. Deep domain generalization via conditional invariant adversarial networks. In ECCV, 2018b.
  36. Bidirectional learning for domain adaptation of semantic segmentation. In CVPR, 2019.
  37. Conditional adversarial domain adaptation. In NeurIPS, 2018.
  38. Decoupled weight decay regularization. In ICLR, 2019.
  39. The mapillary vistas dataset for semantic understanding of street scenes. In ICCV, 2017.
  40. Two at once: Enhancing learning and generalization capacities via ibn-net. In ECCV, 2018.
  41. Semantic-aware domain generalized segmentation. In CVPR, 2022.
  42. Learning to learn single domain generalization. In CVPR, 2020.
  43. Learning transferable visual models from natural language supervision. In ICML, 2021.
  44. Denseclip: Language-guided dense prediction with context-aware prompting. In CVPR, 2022.
  45. Playing for data: Ground truth from computer games. In ECCV, 2016.
  46. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR, 2016.
  47. Mind the backbone: Minimizing backbone distortion for robust object detection. arXiv preprint arXiv:2303.14744, 2023.
  48. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In ICCV, 2021.
  49. Clipood: Generalizing clip to out-of-distributions. In ICML, 2023.
  50. Rethinking the inception architecture for computer vision. In CVPR, 2016.
  51. Adversarial discriminative domain adaptation. In CVPR, 2017.
  52. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In CVPR, 2019.
  53. Generalizing to unseen domains: A survey on domain generalization. T-KDE, 2022.
  54. Robust fine-tuning of zero-shot models. In CVPR, 2022.
  55. Siamdoge: Domain generalizable semantic segmentation using siamese network. In ECCV, 2022.
  56. A fourier-based framework for domain generalization. In CVPR, 2021.
  57. Generalized semantic segmentation by self-supervised source domain projection and multi-level contrastive learning. In AAAI, 2023.
  58. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In CVPR, 2020.
  59. LiT: Zero-shot transfer with locked-image text tuning. In CVPR, 2022.
  60. Sigmoid loss for language image pre-training. In ICCV, 2023.
  61. Domain generalization via entropy regularization. In NeurIPS, 2020.
  62. Style-hallucinated dual consistency learning for domain generalized semantic segmentation. In ECCV, 2022.
  63. Semantic understanding of scenes through the ade20k dataset. IJCV, 2019.
  64. Extract free dense labels from clip. In ECCV, 2022a.
  65. Deep domain-adversarial image generation for domain generalisation. In AAAI, 2020a.
  66. Learning to generate novel domains for domain generalization. In ECCV, 2020b.
  67. Domain generalization with mixstyle. In ICLR, 2021.
  68. Domain generalization: A survey. TPAMI, 2022b.
  69. Conditional prompt learning for vision-language models. In CVPR, 2022c.
  70. Learning to prompt for vision-language models. IJCV, 2022d.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 5 likes.