Papers
Topics
Authors
Recent
2000 character limit reached

Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning

Published 21 Nov 2023 in cs.CV | (2311.13612v2)

Abstract: Over the past year, a large body of multimodal research has emerged around zero-shot evaluation using GPT descriptors. These studies boost the zero-shot accuracy of pretrained VL models with an ensemble of label-specific text generated by GPT. A recent study, WaffleCLIP, demonstrated that similar zero-shot accuracy can be achieved with an ensemble of random descriptors. However, both zero-shot methods are un-trainable and consequently sub-optimal when some few-shot out-of-distribution (OOD) training data is available. Inspired by these prior works, we present two more flexible methods called descriptor and word soups, which do not require an LLM at test time and can leverage training data to increase OOD target accuracy. Descriptor soup greedily selects a small set of textual descriptors using generic few-shot training data, then calculates robust class embeddings using the selected descriptors. Word soup greedily assembles a chain of words in a similar manner. Compared to existing few-shot soft prompt tuning methods, word soup requires fewer parameters by construction and less GPU memory, since it does not require backpropagation. Both soups outperform current published few-shot methods, even when combined with SoTA zero-shot methods, on cross-dataset and domain generalization benchmarks. Compared with SoTA prompt and descriptor ensembling methods, such as ProDA and WaffleCLIP, word soup achieves higher OOD accuracy with fewer ensemble members. Please checkout our code: github.com/Chris210634/word_soups

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. A simple zero-shot prompt weighting technique to improve prompt ensembling in text-image models. In International Conference on Machine Learning, pages 547–568. PMLR, 2023.
  2. Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision, 2014.
  3. Lasp: Text-to-text optimization for language-aware soft prompting of vision & language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23232–23241, 2023.
  4. Domain generalization by mutual-information regularization with pre-trained models. In European Conference on Computer Vision, pages 440–457. Springer, 2022.
  5. Prompt learning with optimal transport for vision-language models. arXiv preprint arXiv:2210.01253, 2022.
  6. Promptstyler: Prompt-driven style generation for source-free domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15702–15712, 2023.
  7. Describing textures in the wild. CoRR, abs/1311.3618, 2013.
  8. Learning expressive prompting with residuals for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3366–3377, 2023.
  9. Bayesian prompt learning for image-language model generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15237–15246, 2023.
  10. Diverse data augmentation with diffusions for effective test-time prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2704–2714, 2023.
  11. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, pages 1–15, 2023.
  12. Cpl: Counterfactual prompt learning for vision and language models. arXiv preprint arXiv:2210.10362, 2022.
  13. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
  14. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021a.
  15. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021b.
  16. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  17. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  18. Vl-pet: Vision-and-language parameter-efficient tuning via granularity control. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3010–3020, 2023.
  19. Openclip, 2021. If you use this software, please cite it as below.
  20. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
  21. Convolutional bypasses are better vision transformer adapters. arXiv preprint arXiv:2207.07039, 2022.
  22. Knowledge-aware prompt tuning for generalizable vision-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15670–15680, 2023.
  23. Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4893–4902, 2019.
  24. Multi-modal classifiers for open-vocabulary object detection. arXiv preprint arXiv:2306.05493, 2023.
  25. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19113–19122, 2023a.
  26. Self-regulating prompts: Foundational model adaptation without forgetting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15190–15200, 2023b.
  27. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
  28. Caltech 101, 2022.
  29. Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems, 35:109–123, 2022.
  30. Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19325–19337, 2023.
  31. Prompt distribution learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5206–5215, 2022.
  32. Towards efficient visual adaption via structural re-parameterization. arXiv preprint arXiv:2302.08106, 2023.
  33. Swapprompt: Test-time prompt adaptation for vision-language models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  34. Fine-grained visual classification of aircraft. CoRR, abs/1306.5151, 2013.
  35. Visual classification via description from large language models. arXiv preprint arXiv:2210.07183, 2022.
  36. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pages 722–729, 2008.
  37. Quantified task misalignment to inform PEFT: An exploration of domain generalization and catastrophic forgetting in CLIP. In ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2024.
  38. Chils: Zero-shot image classification with hierarchical label sets. In International Conference on Machine Learning, pages 26342–26362. PMLR, 2023.
  39. Svl-adapter: Self-supervised adapter for vision-language pretrained models. arXiv preprint arXiv:2210.03794, 2022.
  40. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
  41. Sgva-clip: Semantic-guided visual adapting of vision-language models for few-shot image classification. IEEE Transactions on Multimedia, 2023.
  42. What does a platypus look like? generating customized prompts for zero-shot image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15691–15701, 2023.
  43. Learning transferable visual models from natural language supervision, 2021.
  44. Do imagenet classifiers generalize to imagenet? CoRR, abs/1902.10811, 2019.
  45. Waffling around for performance: Visual classification with random words and broad concepts. arXiv preprint arXiv:2306.07282, 2023.
  46. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
  47. Semi-supervised domain adaptation via minimax entropy. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8050–8058, 2019.
  48. Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  49. Logoprompt: Synthetic text images can be good visual prompts for vision-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2932–2941, 2023.
  50. Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems, 35:14274–14289, 2022.
  51. Clipood: Generalizing clip to out-of-distributions, 2023.
  52. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR, abs/1212.0402, 2012.
  53. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems, 35:12991–13005, 2022a.
  54. Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5227–5237, 2022b.
  55. Robust fine-tuning of vision-language models for domain generalization. In IEEE High Performance Extreme Computing Conference (HPEC), 2023.
  56. Learning robust global representations by penalizing local predictive power. CoRR, abs/1905.13549, 2019.
  57. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022.
  58. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR, 2022a.
  59. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971, 2022b.
  60. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3485–3492, 2010.
  61. Class-aware visual prompt tuning for vision-language pre-trained model. arXiv preprint arXiv:2208.08340, 2022.
  62. Visual-language prompt tuning with knowledge-guided context optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6757–6767, 2023.
  63. Task residual for tuning vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10899–10909, 2023.
  64. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
  65. Unified vision and language prompt learning. arXiv preprint arXiv:2210.07225, 2022.
  66. Tip-adapter: Training-free clip-adapter for better vision-language modeling. arXiv preprint arXiv:2111.03930, 2021.
  67. Bridging theory and algorithm for domain adaptation. In International conference on machine learning, pages 7404–7413. PMLR, 2019.
  68. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022a.
  69. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022b.
  70. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022c.
  71. Prompt-aligned gradient for prompt tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15659–15669, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

GitHub