A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models (2401.11311v3)
Abstract: Few-shot semantic segmentation (FSS) is a crucial challenge in computer vision, driving extensive research into a diverse range of methods, from advanced meta-learning techniques to simple transfer learning baselines. With the emergence of vision foundation models (VFM) serving as generalist feature extractors, we seek to explore the adaptation of these models for FSS. While current FSS benchmarks focus on adapting pre-trained models to new tasks with few images, they emphasize in-domain generalization, making them less suitable for VFM trained on large-scale web datasets. To address this, we propose a novel realistic benchmark with a simple and straightforward adaptation process tailored for this task. Using this benchmark, we conduct a comprehensive comparative analysis of prominent VFM and semantic segmentation models. To evaluate their effectiveness, we leverage various adaption methods, ranging from linear probing to parameter efficient fine-tuning (PEFT) and full fine-tuning. Our findings show that models designed for segmentation can be outperformed by self-supervised (SSL) models. On the other hand, while PEFT methods yields competitive performance, they provide little discrepancy in the obtained results compared to other methods, highlighting the critical role of the feature extractor in determining results. To our knowledge, this is the first study on the adaptation of VFM for FSS.
- “Image segmentation using deep learning: A survey,” 2020.
- “One-shot learning for semantic segmentation,” 2017.
- “A closer look at few-shot classification again,” 2023.
- “Dinov2: Learning robust visual features without supervision,” 2023.
- “Segment anything,” 2023.
- “Learning transferable visual models from natural language supervision,” 2021.
- “Masked autoencoders are scalable vision learners,” 2021.
- “Prototypical networks for few-shot learning,” 2017.
- “The cityscapes dataset for semantic urban scene understanding,” 2016.
- “Microsoft coco: Common objects in context,” 2015.
- “Finely-grained annotated datasets for image-based plant phenotyping,” Pattern Recognition Letters, pp. 1–10, 2015.
- “Few shot semantic segmentation: a review of methodologies and open challenges,” 2023.
- “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
- “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021.
- “Emerging properties in self-supervised vision transformers,” 2021.
- “Scene parsing through ade20k dataset,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5122–5130.
- “Semantic understanding of scenes through the ade20k dataset,” 2018.
- “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results,” http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
- “Fully convolutional networks for semantic segmentation,” 2015.
- “Deep residual learning for image recognition,” 2015.
- “Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning,” 2022.
- “Lora: Low-rank adaptation of large language models,” 2021.
- Roland Gao, “Rethinking dilated convolution for real-time semantic segmentation,” 2023.
- “Randaugment: Practical automated data augmentation with a reduced search space,” 2019.
- “Do vision transformers see like convolutional neural networks?,” 2022.
- “Optimization as a model for few-shot learning,” in International Conference on Learning Representations, 2017.
- “Matching networks for one shot learning,” 2017.
- “Model-agnostic meta-learning for fast adaptation of deep networks,” 2017.
- “Meta-dataset: A dataset of datasets for learning to learn from few examples,” 2020.
- “Learning to prompt for vision-language models,” International Journal of Computer Vision, vol. 130, no. 9, pp. 2337–2348, July 2022.
- “Adversarial feature hallucination networks for few-shot learning,” 2020.
- “Low-shot visual recognition by shrinking and hallucinating features,” 2017.
- “Rapid adaptation with conditionally shifted neurons,” 2018.
- “Meta navigator: Search for a good adaptation policy for few-shot learning,” 2021.
- “Meta networks,” 2017.
- “Universal representation learning from multiple domains for few-shot classification,” 2021.
- “Rethinking few-shot image classification: a good embedding is all you need?,” 2020.
- “Cross-domain few-shot learning with task-specific adapters,” 2022.
- “Feature weighting and boosting for few-shot segmentation,” 2019.
- “Mining latent classes for few-shot segmentation,” 2021.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.