Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning (2404.00603v1)
Abstract: We propose a generalized method for boosting the generalization ability of pre-trained vision-LLMs (VLMs) while fine-tuning on downstream few-shot tasks. The idea is realized by exploiting out-of-distribution (OOD) detection to predict whether a sample belongs to a base distribution or a novel distribution and then using the score generated by a dedicated competition based scoring function to fuse the zero-shot and few-shot classifier. The fused classifier is dynamic, which will bias towards the zero-shot classifier if a sample is more likely from the distribution pre-trained on, leading to improved base-to-novel generalization ability. Our method is performed only in test stage, which is applicable to boost existing methods without time-consuming re-training. Extensive experiments show that even weak distribution detectors can still improve VLMs' generalization ability. Specifically, with the help of OOD detectors, the harmonic mean of CoOp and ProGrad increase by 2.6 and 1.5 percentage points over 11 recognition datasets in the base-to-novel setting.
- Food-101 - Mining Discriminative Components with Random Forests. In ECCV, 446–461.
- PLOT: Prompt Learning with Optimal Transport for Vision-Language Models. In ICLR.
- Dynamic Convolution: Attention Over Convolution Kernels. In CVPR, 11027–11036.
- Describing Textures in the Wild. In CVPR, 3606–3613.
- ImageNet: A large-scale hierarchical image database. In CVPR, 248–255.
- Prompt Tuning with Soft Context Sharing for Vision-Language Models. CoRR, abs/2208.13474.
- Generating natural adversarial examples with universal perturbations for text classification. Neurocomputing, 471: 175–182.
- Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection. In ICLR.
- EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 12(7): 2217–2226.
- Scaling Out-of-Distribution Detection for Real-World Settings. In ICML, 8759–8773.
- The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. In ICCV, 8320–8329.
- A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In ICLR.
- Parameter Efficient Dynamic Convolution via Tensor Decomposition. In BMVC, 107.
- Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. In ICML, volume 139, 4904–4916.
- 3D Object Representations for Fine-Grained Categorization. In ICCVW, 554–561.
- Omni-Dimensional Dynamic Convolution. In ICLR.
- Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst., 106(1): 59–70.
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML.
- Energy-based Out-of-distribution Detection. In NeurIPS.
- Instance-Aware Dynamic Neural Network Quantization. In CVPR, 12424–12433.
- Fine-Grained Visual Classification of Aircraft. CoRR, abs/1306.5151.
- ClipCap: CLIP Prefix for Image Captioning. CoRR, abs/2111.09734.
- Automated Flower Classification over a Large Number of Classes. In ICVGIP, 722–729.
- Cats and dogs. In CVPR, 3498–3505.
- Learning Transferable Visual Models From Natural Language Supervision. In ICML, volume 139, 8748–8763.
- Hierarchical Text-Conditional Image Generation with CLIP Latents. CoRR, abs/2204.06125.
- DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting. In CVPR, 18082–18091.
- Do ImageNet Classifiers Generalize to ImageNet? In ICML, volume 97, 5389–5400.
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In ICLR.
- Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models. In NeurIPS.
- UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. CoRR, abs/1212.0402.
- DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations. CoRR, abs/2206.09541.
- Learning Robust Global Representations by Penalizing Local Predictive Power. In NeurIPS, 10506–10518.
- SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 3485–3492.
- Generalized Out-of-Distribution Detection: A Survey. CoRR, abs/2110.11334.
- Visual-Language Prompt Tuning with Knowledge-guided Context Optimization. In CVPR.
- Decoupling MaxLogit for Out-of-Distribution Detection. In CVPR, 3388–3397.
- Conditional Prompt Learning for Vision-Language Models. In CVPR, 16816–16825.
- Learning to Prompt for Vision-Language Models. IJCV, 130(9): 2337–2348.
- Prompt-aligned Gradient for Prompt Tuning. ICCV.