DePT: Decoupled Prompt Tuning (2309.07439v2)
Abstract: This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i.e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks. To address this, we propose the Decoupled Prompt Tuning (DePT) framework, which decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, so as to maximally preserve task-shared knowledge in the original feature space for achieving better zero-shot generalization on new tasks. Importantly, our DePT is orthogonal to existing prompt tuning methods, hence it can improve all of them. Extensive experiments on 11 datasets show the strong flexibility and effectiveness of DePT. Our code and pretrained models are available at https://github.com/Koorye/DePT.
- Abien Fred Agarap. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375, 2018.
- Food-101–mining discriminative components with random forests. In ECCV, pages 446–461. Springer, 2014.
- Prompt learning with optimal transport for vision-language models. ICLR, 2022.
- Improving video-text retrieval by multi-stream corpus alignment and dual softmax loss. arXiv preprint arXiv:2109.04290, 2021.
- Describing textures in the wild. In CVPR, pages 3606–3613, 2014.
- Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255. Ieee, 2009.
- Pros: Prompting-to-simulate generalized knowledge for universal cross-domain retrieval. CVPR, 2024.
- Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In CVPRW, pages 178–178. IEEE, 2004.
- Meta-fdmixup: Cross-domain few-shot learning guided by labeled target data. In ACM MM, pages 5326–5334, 2021.
- Styleadv: Meta style adversarial training for cross-domain few-shot learning. In CVPR, pages 24575–24584, 2023.
- Generating natural adversarial examples with universal perturbations for text classification. Neurocomputing, 471:175–182, 2022.
- Clip-adapter: Better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544, 2021.
- Parameter-efficient transfer learning with diff pruning. ACL, 2020.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In ICCV, pages 8340–8349, 2021.
- Parameter-efficient transfer learning for nlp. In ICML, pages 2790–2799, 2019.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Diversity-aware meta visual prompting. In CVPR, pages 10878–10887, 2023.
- Wenlan: Bridging vision and language by large-scale multi-modal pre-training. arXiv preprint arXiv:2103.06561, 2021.
- Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, pages 4904–4916, 2021.
- Visual prompt tuning. In ECCV, pages 709–727, 2022.
- Semi-supervised video paragraph grounding with contrastive encoder. In CVPR, pages 2456–2465. IEEE, 2022a.
- Sdn: Semantic decoupling network for temporal language grounding. IEEE Transactions on Neural Networks and Learning Systems, pages 1–15, 2022b.
- Context-aware alignment and mutual masking for 3D-language pre-training. In CVPR, pages 10984–10994, 2023.
- Maple: Multi-modal prompt learning. In CVPR, pages 19113–19122, 2023a.
- Self-regulating prompts: Foundational model adaptation without forgetting. In ICCV, pages 15190–15200, 2023b.
- Vilt: Vision-and-language transformer without convolution or region supervision. In International Conference on Machine Learning, pages 5583–5594. PMLR, 2021.
- 3d object representations for fine-grained categorization. In ICCVW, pages 554–561, 2013.
- Align before fuse: Vision and language representation learning with momentum distillation. NeurIPS, 34:9694–9705, 2021.
- Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. NeurIPS, 32, 2019.
- X-clip: End-to-end multi-grained contrastive learning for video-text retrieval. In ACM MM, pages 638–647, 2022.
- Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
- Disentangled multiplex graph representation learning. In ICML, pages 24983–25005, 2023.
- Self-supervised heterogeneous graph learning: a homogeneity and heterogeneity perspective. In ICLR, 2024.
- Automated flower classification over a large number of classes. In ICVGIP, pages 722–729. IEEE, 2008.
- Cats and dogs. In CVPR, pages 3498–3505. IEEE, 2012.
- Styleclip: Text-driven manipulation of stylegan imagery. In CVPR, pages 2085–2094, 2021.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
- Denseclip: Language-guided dense prediction with context-aware prompting. In CVPR, pages 18082–18091, 2022.
- Do imagenet classifiers generalize to imagenet? In ICML, pages 5389–5400. PMLR, 2019.
- Fads: Fourier-augmentation based data-shunting for few-shot classification. IEEE Transactions on Circuits and Systems for Video Technology, 2023a.
- Attention-based multi-view feature collaboration for decoupled few-shot learning. IEEE Transactions on Circuits and Systems for Video Technology, 2023b.
- Collaborative consortium of foundation models for open-world few-shot learning. In AAAI, 2024.
- Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
- Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In CVPR, pages 3835–3844, 2022a.
- Learning to decompose visual features with latent textual prompts. ICLR, 2022b.
- Learning robust global representations by penalizing local predictive power. NeurIPS, 32, 2019.
- Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, pages 3485–3492. IEEE, 2010.
- Visual-language prompt tuning with knowledge-guided context optimization. In CVPR, pages 6757–6767, 2023.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
- Progressive meta-learning with curriculum. IEEE Transactions on Circuits and Systems for Video Technology, 32(9):5916–5930, 2022a.
- Free-lunch for cross-domain few-shot learning: Style-aware episodic training with robust contrastive learning. In ACM MM, pages 2586–2594, 2022b.
- From global to local: Multi-scale out-of-distribution detection. IEEE Transactions on Image Processing, 2023a.
- Deta: Denoised task adaptation for few-shot learning. In ICCV, pages 11541–11551, 2023b.
- Tip-adapter: Training-free adaption of clip for few-shot classification. In ECCV, pages 493–510. Springer, 2022c.
- Extract free dense labels from clip. In ECCV, pages 696–712. Springer, 2022a.
- Conditional prompt learning for vision-language models. In CVPR, pages 16816–16825, 2022b.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022c.
- Prompt-aligned gradient for prompt tuning. ICCV, 2023a.
- Complementarity-aware space learning for video-text retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 2023b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.