Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach (2404.11732v1)
Abstract: The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we examine the effectiveness of prompting a transformer-decoder with learned visual prompts for the generalized few-shot segmentation (GFSS) task. Our goal is to achieve strong performance not only on novel categories with limited examples, but also to retain performance on base categories. We propose an approach to learn visual prompts with limited examples. These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions. Additionally, we introduce a unidirectional causal attention mechanism between the novel prompts, learned with limited examples, and the base prompts, learned with abundant data. This mechanism enriches the novel prompts without deteriorating the base class performance. Overall, this form of prompting helps us achieve state-of-the-art performance for GFSS on two different benchmark datasets: COCO-$20i$ and Pascal-$5i$, without the need for test-time optimization (or transduction). Furthermore, test-time optimization leveraging unlabelled test data can be used to improve the prompts, which we refer to as transductive prompt tuning.
- Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS), pages 23716–23736, 2022.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Information maximization for few-shot learning. Advances in Neural Information Processing Systems (NeurIPS), 33:2445–2457, 2020.
- Few-shot segmentation without meta-learning: A good transductive inference is all you need? In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13979–13988, 2021.
- Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 33:1877–1901, 2020.
- Fs-detr: Few-shot detection transformer with prompting and without re-training. In International Conference on Computer Vision (ICCV), pages 11793–11802, 2023.
- A theoretical analysis of the number of shots in few-shot learning. International Conference on Learning Representations (ICLR), 2020.
- Modeling the background for incremental learning in semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9233–9242, 2020.
- A closer look at few-shot classification. In International Conference on Learning Representations (ICLR), 2019.
- Masked-attention mask transformer for universal image segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1290–1299, 2022.
- A baseline for few-shot image classification. In International Conference on Learning Representations (ICLR), 2020.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (ICML), 2017.
- A strong baseline for generalized few-shot semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11269–11278, 2023.
- Low-shot visual recognition by shrinking and hallucinating features. In International Conference on Computer Vision (ICCV), 2017.
- Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Visual prompt tuning. In European Conference on Computer Vision (ECCV), pages 709–727, 2022.
- Few-shot object detection via feature reweighting. In International Conference on Computer Vision (ICCV), 2019.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Learning what not to segment: A new perspective on few-shot segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8057–8067, 2022.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Learning orthogonal prototypes for generalized few-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11319–11328, 2023.
- Learning to propagate labels: Transductive propagation network for few-shot learning. In International Conference on Learning Representations (ICLR), 2019a.
- Learning to propagate labels: Transductive propagation network for few-shot learning. In International Conference on Learning Representations (ICLR), 2019b.
- Part-aware prototype network for few-shot semantic segmentation. In European Conference on Computer Vision (ECCV), 2020.
- Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR), 2019.
- Orbit: A real-world few-shot dataset for teachable object recognition. In International Conference on Computer Vision (ICCV), pages 10818–10828, 2021.
- Hypercorrelation squeeze for few-shot segmentation. In International Conference on Computer Vision (ICCV), pages 6941–6952, 2021.
- On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
- Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
- Transductive few-shot classification on the oblique manifold. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8412–8422, 2021.
- Transductive episodic-wise adaptive metric for few-shot learning. In International Conference on Computer Vision (ICCV), pages 3603–3612, 2019.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), pages 8748–8763, 2021.
- Optimization as a model for few-shot learning. In International Conference on Learning Representations (ICLR), 2017.
- Meta-learning for semi-supervised few-shot classification. In International Conference on Learning Representations (ICLR), 2018.
- Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
- Prior guided feature enrichment network for few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(2):1050–1065, 2020.
- Generalized few-shot semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11563–11572, 2022.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research (JMLR), 9(11), 2008.
- Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
- Matching networks for one shot learning. In Advances in Neural Information Processing Systems (NeurIPS), pages 3630–3638, 2016.
- PANet: Few-shot image semantic segmentation with prototype alignment. In International Conference on Computer Vision (ICCV), pages 9197–9206, 2019a.
- Frustratingly simple few-shot object detection. In International Conference on Machine Learning (ICML), 2020.
- Meta-learning to detect rare objects. In International Conference on Computer Vision (ICCV), 2019b.
- Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7031–7040, 2023.
- Meta R-CNN: Towards general solver for instance-level low-shot learning. In International Conference on Computer Vision (ICCV), 2019.
- Prototype mixture models for few-shot semantic segmentation. In European Conference on Computer Vision (ECCV), pages 763–778, 2020a.
- BriNet: Towards bridging the intra-class and inter-class gaps in one-shot segmentation. In British Machine Vision Conference (BMVC), 2020b.
- Self-guided and cross-guided learning for few-shot segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8312–8321, 2021.
- Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In International Conference on Computer Vision (ICCV), pages 9587–9595, 2019a.
- CANet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5217–5226, 2019b.
- Feature-proxy transformer for few-shot segmentation. Advances in Neural Information Processing Systems, 35:6575–6588, 2022.
- Pyramid scene parsing network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2881–2890, 2017.
- Transductive few-shot learning with prototype-based label propagation by iterative graph refinement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23996–24006, 2023.
- Deformable detr: Deformable transformers for end-to-end object detection. International Conference on Learning Representations (ICLR), 2021.
- Mir Rayat Imtiaz Hossain (5 papers)
- Mennatullah Siam (33 papers)
- Leonid Sigal (101 papers)
- James J. Little (23 papers)