Any-Shift Prompting for Generalization over Distributions (2402.10099v1)
Abstract: Image-LLMs with prompt learning have shown remarkable advances in numerous downstream vision tasks. Nevertheless, conventional prompt learning methods overfit their training distribution and lose the generalization ability on test distributions. To improve generalization across various distribution shifts, we propose any-shift prompting: a general probabilistic inference framework that considers the relationship between training and test distributions during prompt learning. We explicitly connect training and test distributions in the latent space by constructing training and test prompts in a hierarchical architecture. Within this framework, the test prompt exploits the distribution relationships to guide the generalization of the CLIP image-LLM from training to any test distribution. To effectively encode the distribution information and their relationships, we further introduce a transformer inference network with a pseudo-shift training mechanism. The network generates the tailored test prompt with both training and test information in a feedforward pass, avoiding extra training costs at test time. Extensive experiments on twenty-three datasets demonstrate the effectiveness of any-shift prompting on the generalization over various distribution shifts.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Regularized learning for domain adaptation under label shifts. arXiv preprint arXiv:1903.09734, 2019.
- Exploring visual prompts for adapting large-scale models. arXiv preprint arXiv:2203.17274, 2022.
- MetaReg: Towards domain generalization using meta-regularization. In Advances in Neural Information Processing Systems, pages 998–1008, 2018.
- Food-101–mining discriminative components with random forests. In European Conference on Computer Vision, pages 446–461. Springer, 2014.
- Exploiting hierarchical context on a large database of object categories. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 129–136. IEEE, 2010.
- Describing textures in the wild. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3606–3613, 2014.
- ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- Bayesian prompt learning for image-language model generalization. In IEEE International Conference on Computer Vision, pages 15237–15246, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
- Domain generalization via model-agnostic learning of semantic features. In Advances in Neural Information Processing Systems, 2019.
- Adaptive methods for real-world domain generalization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 14340–14349, 2021.
- The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, 2010.
- A video saliency detection model in compressed domain. IEEE Transactions on Circuits and Systems for Video Technology, 24(1):27–38, 2013.
- Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In IEEE Conference on Computer Vision and Pattern Recognition Workshop, pages 178–178. IEEE, 2004.
- Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, pages 1–15, 2023.
- Rlsbench: Domain adaptation under relaxed label shift. In International Conference on Machine Learning, pages 10879–10928. PMLR, 2023.
- Domain adaptation with conditional transferable components. In International Conference on Machine Learning, pages 2839–2848. PMLR, 2016.
- Test-time adaptation via conjugate pseudo-labels. In Advances in Neural Information Processing Systems, 2022.
- Caltech-256 object category dataset. 2007.
- In search of lost domain generalization. In International Conference on Learning Representations, 2020.
- Ltf: A label transformation framework for correcting label shift. In International Conference on Machine Learning, pages 3843–3853. PMLR, 2020.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
- Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In IEEE International Conference on Computer Vision, pages 8340–8349, 2021a.
- Natural adversarial examples. In IEEE Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021b.
- Yusuke Iwasawa et al. Test-time classifier adjustment module for model-agnostic domain generalization. In Advances in Neural Information Processing Systems, 2021.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR, 2021.
- Maple: Multi-modal prompt learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 19113–19122, 2023.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- 3d object representations for fine-grained categorization. In IEEE International Conference on Computer Vision Workshops, pages 554–561, 2013.
- Surgical fine-tuning improves adaptation to distribution shifts. In International Conference on Learning Representations, 2023.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
- Deeper, broader and artier domain generalization. In IEEE International Conference on Computer Vision, pages 5542–5550, 2017.
- Learning to generalize: Meta-learning for domain generalization. In AAAI Conference on Artificial Intelligence, 2018a.
- Domain generalization with adversarial feature learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5400–5409, 2018b.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International Conference on Machine Learning, pages 6028–6039. PMLR, 2020.
- Ttn: A domain-shift aware batch normalization in test-time adaptation. In International Conference on Learning Representations, 2023.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
- Adversarial unsupervised domain adaptation with conditional and label shift: Infer, align and iterate. In IEEE International Conference on Computer Vision, pages 10367–10376, 2021a.
- Deep unsupervised domain adaptation: A review of recent advances and perspectives. APSIPA Transactions on Signal and Information Processing, 11(1), 2022.
- Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, 2021b.
- Learning transferable features with deep adaptation networks. In International Conference on Machine Learning, pages 97–105. PMLR, 2015.
- Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
- Unified deep supervised domain adaptation and generalization. In IEEE International Conference on Computer Vision, pages 5715–5725, 2017.
- Domain generalization via invariant feature representation. In International Conference on Machine Learning, pages 10–18. PMLR, 2013.
- Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
- Efficient test-time model adaptation without forgetting. In International Conference on Machine Learning, pages 16888–16905. PMLR, 2022.
- Towards stable test-time adaptation in dynamic wild world. In International Conference on Learning Representations, 2023.
- Chils: Zero-shot image classification with hierarchical label sets. In International Conference on Machine Learning, pages 26342–26362. PMLR, 2023.
- Label shift adapter for test-time adaptation under covariate and label shifts. In IEEE International Conference on Computer Vision, pages 16421–16431, 2023.
- Cats and dogs. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3498–3505. IEEE, 2012.
- Moment matching for multi-source domain adaptation. In IEEE International Conference on Computer Vision, pages 1406–1415, 2019.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, pages 5389–5400. PMLR, 2019.
- Unsupervised learning under latent label shift. In Advances in Neural Information Processing Systems, pages 18763–18778, 2022.
- Waffling around for performance: Visual classification with random words and broad concepts. arXiv preprint arXiv:2306.07282, 2023.
- Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1-3):157–173, 2008.
- Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization. In Advances in Neural Information Processing Systems, 2023.
- Breeds: Benchmarks for subpopulation shift. arXiv preprint arXiv:2008.04859, 2020.
- Improving robustness against common corruptions by covariate shift adaptation. In Advances in Neural Information Processing Systems, pages 11539–11551, 2020.
- Association graph learning for multi-task classification with category shifts. In Advances in Neural Information Processing Systems, pages 4503–4516, 2022.
- Test-time prompt tuning for zero-shot generalization in vision-language models. In Advances in Neural Information Processing Systems, pages 14274–14289, 2022.
- Open domain generalization with domain-augmented meta-learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 9624–9633, 2021.
- Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
- Test-time training with self-supervision for generalization under distribution shifts. In International Conference on Machine Learning, pages 9229–9248. PMLR, 2020.
- Domain adaptation with conditional distribution matching and generalized label shift. In Advances in Neural Information Processing Systems, pages 19276–19289, 2020.
- Deep hashing network for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5018–5027, 2017.
- Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations, 2021.
- Learning robust global representations by penalizing local predictive power. In Advances in Neural Information Processing Systems, 2019.
- Deep visual domain adaptation: A survey. Neurocomputing, 312:135–153, 2018.
- Online adaptation to label distribution shift. In Advances in Neural Information Processing Systems, pages 11340–11351, 2021.
- Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9):2251–2265, 2018.
- Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3485–3492. IEEE, 2010.
- A bit more bayesian: Domain-invariant learning with uncertainty. In International Conference on Machine Learning. PMLR, 2021.
- Learning to generalize across domains on single test samples. In International Conference on Learning Representations, 2022.
- Visual-language prompt tuning with knowledge-guided context optimization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 6757–6767, 2023.
- When source-free domain adaptation meets learning with noisy labels. In International Conference on Learning Representations, 2023.
- Domain adaptation under target and conditional shift. In International Conference on Machine Learning, pages 819–827. PMLR, 2013.
- Memo: Test time robustness via adaptation and augmentation. In Advances in Neural Information Processing Systems, pages 38629–38642, 2022.
- Domain prompt learning for efficiently adapting clip to unseen domains. arXiv e-prints, pages arXiv–2111, 2021.
- Adanpc: Exploring non-parametric classifier for test-time adaptation. In International Conference on Machine Learning, pages 41647–41676. PMLR, 2023.
- Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022a.
- Conditional prompt learning for vision-language models. In IEEE Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022b.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022c.
- Prompt-aligned gradient for prompt tuning. In IEEE International Conference on Computer Vision, pages 15659–15669, 2023.