Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompt Learning via Meta-Regularization (2404.00851v1)

Published 1 Apr 2024 in cs.CV

Abstract: Pre-trained vision-LLMs have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-LLMs to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision LLMs is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-LLMs. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-LLMs. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. A simple zero-shot prompt weighting technique to improve prompt ensembling in text-image models. In ICML, 2023.
  2. How to train your maml. In ICLR, 2019.
  3. Metareg: Towards domain generalization using meta-regularization. In NeurIPS, 2018.
  4. Meta learning via learned loss. In ICPR, 2020.
  5. Food-101–mining discriminative components with random forests. In ECCV, 2014.
  6. Distribution-aware prompt tuning for vision-language models. In ICCV, 2023.
  7. Tokenmixup: Efficient attention-guided token-level data augmentation for transformers. In NeurIPS, 2023.
  8. Describing textures in the wild. In CVPR, 2014.
  9. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  11. Learning to prompt for open-vocabulary object detection with vision-language model. In CVPR, 2022.
  12. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In CVPRW, 2004.
  13. Promptdet: Towards open-vocabulary detection using uncurated images. In ECCV, 2022.
  14. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, 2017.
  15. Recasting gradient-based meta-learning as hierarchical bayes. In ICLR, 2018.
  16. Open-vocabulary object detection via vision and language knowledge distillation. In ICLR, 2022.
  17. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. JSTARS, 12(7):2217–2226, 2019.
  18. The many faces of robustness: A critical analysis of out-of-distribution generalization. In ICCV, 2021a.
  19. Natural adversarial examples. In CVPR, 2021b.
  20. Learning to learn using gradient descent. In ICANN, 2001.
  21. Meta-learning in neural networks: A survey. TPAMI, 44(9):5149–5169, 2021.
  22. Self-supervised auxiliary learning with meta-paths for heterogeneous graphs. In NeurIPS, 2020.
  23. Self-supervised auxiliary learning for graph neural networks via meta-learning. arXiv:2103.00771, 2021.
  24. Patching open-vocabulary models by interpolating weights. In NeurIPS, 2022.
  25. Averaging weights leads to wider optima and better generalization. In UAI, 2018.
  26. Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, 2021.
  27. Visual prompt tuning. In ECCV, 2022.
  28. Maple: Multi-modal prompt learning. In CVPR, 2023a.
  29. Self-regulating prompts: Foundational model adaptation without forgetting. In ICCV, 2023b.
  30. Co-mixup: Saliency guided joint mixup with supermodular diversity. In ICLR, 2021.
  31. Meltr: Meta loss transformer for learning to fine-tune video foundation models. In CVPR, 2023.
  32. Siamese neural networks for one-shot image recognition. In ICMLW, 2015.
  33. 3d object representations for fine-grained categorization. In ICCVW, 2013.
  34. Read-only prompt optimization for vision-language few-shot learning. In ICCV, 2023.
  35. Meta-learning with differentiable convex optimization. In CVPR, 2019.
  36. The power of scale for parameter-efficient prompt tuning. In EMNLP, 2021.
  37. Learning to generalize: Meta-learning for domain generalization. In AAAI, 2018.
  38. Gradient-regulated meta-prompt learning for generalizable vision-language models. In ICCV, 2023a.
  39. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In ICML, 2023b.
  40. Prefix-tuning: Optimizing continuous prompts for generation. In ACL, 2021.
  41. Gpt understands, too. AI Open, 2023.
  42. Decoupled weight decay regularization. In ICLR, 2019.
  43. Prompt distribution learning. In CVPR, 2022.
  44. Image segmentation using text and image prompts. In CVPR, 2022.
  45. Fine-grained visual classification of aircraft. Technical report, 2013.
  46. A simple neural attentive meta-learner. In ICLR, 2018.
  47. Clipcap: Clip prefix for image captioning. arXiv:2111.09734, 2021.
  48. Meta networks. In ICML, 2017.
  49. Rapid adaptation with conditionally shifted neurons. In ICML, 2018.
  50. On first-order meta-learning algorithms. arXiv:1803.02999, 2018.
  51. Automated flower classification over a large number of classes. In ICVGIP, 2008.
  52. Metropolis-hastings data augmentation for graph neural networks. In NeurIPS, 2022.
  53. Cats and dogs. In CVPR, 2012.
  54. Automatic differentiation in pytorch. In ICLRW, 2017.
  55. Learning transferable visual models from natural language supervision. In ICML, 2021.
  56. Optimization as a model for few-shot learning. In ICLR, 2016.
  57. Do imagenet classifiers generalize to imagenet? In ICML, 2019.
  58. Meta-learning with memory-augmented neural networks. In ICML, 2016.
  59. Meta-weight-net: Learning an explicit mapping for sample weighting. In NeurIPS, 2019.
  60. Flava: A foundational language and vision alignment model. In CVPR, 2022.
  61. Prototypical networks for few-shot learning. In NeurIPS, 2017.
  62. Ucf101: A dataset of 101 human actions classes from videos in the wild. In ICCVW, 2013.
  63. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 15(1):1929–1958, 2014.
  64. Learning to compare: Relation network for few-shot learning. In CVPR, 2018.
  65. Saliencymix: A saliency guided data augmentation strategy for better regularization. In ICLR, 2021.
  66. Manifold mixup: learning better representations by interpolating hidden states. In ICML, 2019.
  67. Matching networks for one shot learning. In NeurIPS, 2016.
  68. Learning robust global representations by penalizing local predictive power. In NeurIPS, 2019.
  69. Robust fine-tuning of zero-shot models. In CVPR, 2022.
  70. Zero-shot learning-the good, the bad and the ugly. In CVPR, 2017.
  71. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
  72. Meta-learning with fewer tasks through task interpolation. In ICLR, 2022.
  73. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV, 2019.
  74. Unified vision and language prompt learning. arXiv:2210.07225, 2022.
  75. Lit: Zero-shot transfer with locked-image text tuning. In CVPR, 2022.
  76. Three mechanisms of weight decay regularization. In ICLR, 2019.
  77. mixup: Beyond empirical risk minimization. In ICLR, 2018.
  78. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. In ICLR, 2024.
  79. Regionclip: Region-based language-image pretraining. In CVPR, 2022.
  80. Domain adaptive ensemble learning. TIP, 30:8008–8018, 2021.
  81. Domain generalization: A survey. TPAMI, 2022a.
  82. Conditional prompt learning for vision-language models. In CVPR, 2022b.
  83. Learning to prompt for vision-language models. IJCV, 130(9):2337–2348, 2022c.
  84. Prompt-aligned gradient for prompt tuning. In ICCV, 2023.
  85. Fast context adaptation via meta-learning. In ICML, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jinyoung Park (46 papers)
  2. Juyeon Ko (6 papers)
  3. Hyunwoo J. Kim (70 papers)
Citations (5)