Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VRP-SAM: SAM with Visual Reference Prompt (2402.17726v3)

Published 27 Feb 2024 in cs.CV

Abstract: In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmentation of specific objects in target image. It is note that the VRP encoder can support a variety of annotation formats for reference images, including \textbf{point}, \textbf{box}, \textbf{scribble}, and \textbf{mask}. VRP-SAM achieves a breakthrough within the SAM framework by extending its versatility and applicability while preserving SAM's inherent strengths, thus enhancing user-friendliness. To enhance the generalization ability of VRP-SAM, the VRP encoder adopts a meta-learning strategy. To validate the effectiveness of VRP-SAM, we conducted extensive empirical studies on the Pascal and COCO datasets. Remarkably, VRP-SAM achieved state-of-the-art performance in visual reference segmentation with minimal learnable parameters. Furthermore, VRP-SAM demonstrates strong generalization capabilities, allowing it to perform segmentation of unseen objects and enabling cross-domain segmentation. The source code and models will be available at \url{https://github.com/syp2ysy/VRP-SAM}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Application of segment anything model for civil infrastructure defect assessment. arXiv preprint arXiv:2304.12600, 2023.
  2. Visual prompting via image inpainting. Advances in Neural Information Processing Systems, 35:25005–25017, 2022.
  3. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  4. Few-shot segmentation without meta-learning: A good transductive inference is all you need? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13979–13988, 2021.
  5. Segment anything model (sam) enhanced pseudo labels for weakly supervised semantic segmentation. arXiv preprint arXiv:2305.05803, 2023.
  6. Samaug: Point prompt augmentation for segment anything model. arXiv preprint arXiv:2307.01187, 2023.
  7. Self-support few-shot semantic segmentation. In European Conference on Computer Vision, pages 701–719. Springer, 2022.
  8. Deep learning universal crater detection using segment anything model (sam). arXiv preprint arXiv:2304.07764, 2023.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  10. Cost aggregation with 4d convolutional swin transformer for few-shot segmentation. In European Conference on Computer Vision, pages 108–126. Springer, 2022.
  11. Dense gaussian processes for few-shot segmentation. In European Conference on Computer Vision, pages 217–234. Springer, 2022.
  12. Segment anything in high quality. arXiv preprint arXiv:2306.01567, 2023.
  13. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  14. Learning what not to segment: A new perspective on few-shot segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8057–8067, 2022.
  15. Adaptive prototype learning and allocation for few-shot segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8334–8343, 2021.
  16. Dynamic prototype convolution network for few-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11553–11562, 2022a.
  17. Learning non-target knowledge for few-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11573–11582, 2022b.
  18. Matcher: Segment anything with one shot using all-purpose feature matching. arXiv preprint arXiv:2305.13310, 2023.
  19. Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
  20. Simpler is better: Few-shot semantic segmentation with classifier weight transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8741–8750, 2021.
  21. Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6941–6952, 2021.
  22. Hm: Hybrid masking for few-shot segmentation. In European Conference on Computer Vision, pages 506–523. Springer, 2022.
  23. Feature weighting and boosting for few-shot segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 622–631, 2019.
  24. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  25. Hierarchical dense correlation distillation for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23641–23651, 2023.
  26. Token contrast for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3093–3102, 2023.
  27. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
  28. One-shot learning for semantic segmentation. arXiv preprint arXiv:1709.03410, 2017.
  29. Ssa: Semantic structure aware inference for weakly pixel-wise dense predictions without cost. arXiv preprint arXiv:2111.03392, 2021.
  30. Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning. Advances in Neural Information Processing Systems, 35:37484–37496, 2022.
  31. Prior guided feature enrichment network for few-shot segmentation. IEEE transactions on pattern analysis and machine intelligence, 44(2):1050–1065, 2020.
  32. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
  33. Panet: Few-shot image semantic segmentation with prototype alignment. In proceedings of the IEEE/CVF international conference on computer vision, pages 9197–9206, 2019.
  34. Images speak in images: A generalist painter for in-context visual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6830–6839, 2023a.
  35. Seggpt: Segmenting everything in context. arXiv preprint arXiv:2304.03284, 2023b.
  36. Edit everything: A text-guided generative system for images editing. arXiv preprint arXiv:2304.14006, 2023.
  37. Prototype mixture models for few-shot semantic segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 763–778. Springer, 2020.
  38. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023.
  39. Matte anything: Interactive natural image matting with segment anything models. arXiv preprint arXiv:2306.04121, 2023.
  40. Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790, 2023.
  41. Few-shot segmentation via cycle-consistent transformer. Advances in Neural Information Processing Systems, 34:21984–21996, 2021.
  42. Feature-proxy transformer for few-shot segmentation. Advances in Neural Information Processing Systems, 35:6575–6588, 2022.
  43. Personalize segment anything model with one shot. arXiv preprint arXiv:2305.03048, 2023a.
  44. Personalize segment anything model with one shot. arXiv preprint arXiv:2305.03048, 2023b.
  45. Segment everything everywhere all at once. arXiv preprint arXiv:2304.06718, 2023.
Citations (15)

Summary

We haven't generated a summary for this paper yet.