Papers
Topics
Authors
Recent
2000 character limit reached

NubbleDrop: A Simple Way to Improve Matching Strategy for Prompted One-Shot Segmentation

Published 19 May 2024 in cs.CV and cs.AI | (2405.11476v1)

Abstract: Driven by large data trained segmentation models, such as SAM , research in one-shot segmentation has experienced significant advancements. Recent contributions like PerSAM and MATCHER , presented at ICLR 2024, utilize a similar approach by leveraging SAM with one or a few reference images to generate high quality segmentation masks for target images. Specifically, they utilize raw encoded features to compute cosine similarity between patches within reference and target images along the channel dimension, effectively generating prompt points or boxes for the target images a technique referred to as the matching strategy. However, relying solely on raw features might introduce biases and lack robustness for such a complex task. To address this concern, we delve into the issues of feature interaction and uneven distribution inherent in raw feature based matching. In this paper, we propose a simple and training-free method to enhance the validity and robustness of the matching strategy at no additional computational cost (NubbleDrop). The core concept involves randomly dropping feature channels (setting them to zero) during the matching process, thereby preventing models from being influenced by channels containing deceptive information. This technique mimics discarding pathological nubbles, and it can be seamlessly applied to other similarity computing scenarios. We conduct a comprehensive set of experiments, considering a wide range of factors, to demonstrate the effectiveness and validity of our proposed method. Our results showcase the significant improvements achieved through this simmple and straightforward approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. k-means++: The advantages of careful seeding. In Soda, volume 7, pages 1027–1035, 2007.
  2. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  3. Yadolah Dodge. The Oxford dictionary of statistical terms. Oxford University Press, USA, 2003.
  4. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  5. The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–338, 2010.
  6. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364, 2019.
  7. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  8. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  9. Cost aggregation with 4d convolutional swin transformer for few-shot segmentation. In European Conference on Computer Vision, pages 108–126. Springer, 2022.
  10. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
  11. Msanet: Multi-similarity and attention guidance for boosting few-shot segmentation. arXiv preprint arXiv:2206.09667, 2022.
  12. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
  13. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
  14. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  15. Uncertainty-aware joint salient object and camouflaged object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10071–10081, 2021.
  16. Semantic-sam: Segment and recognize anything at any granularity. arXiv preprint arXiv:2307.04767, 2023.
  17. Fss-1000: A 1000-class dataset for few-shot segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2869–2878, 2020.
  18. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  19. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  20. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  21. Matcher: Segment anything with one shot using all-purpose feature matching. arXiv preprint arXiv:2305.13310, 2023.
  22. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
  23. Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6941–6952, 2021.
  24. Attention-based joint detection of object and semantic part. arXiv preprint arXiv:2007.02419, 2020.
  25. Feature weighting and boosting for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 622–631, 2019.
  26. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  27. Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 2160–2170, 2022.
  28. Multi-scale interactive network for salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9413–9422, 2020.
  29. Medical image segmentation using deep semantic-based methods: A review of techniques, applications and emerging trends. Information Fusion, 90:316–352, 2023.
  30. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  31. Ambiguous medical image segmentation using diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11536–11546, 2023.
  32. One-shot learning for semantic segmentation. arXiv preprint arXiv:1709.03410, 2017.
  33. Detecting statistical interactions with additive groves of trees. In Proceedings of the 25th international conference on Machine learning, pages 1000–1007, 2008.
  34. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  35. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
  36. Detecting statistical interactions from neural network weights. In International Conference on Learning Representations, 2018.
  37. How does this interaction affect me? interpretable attribution for feature interactions. Advances in neural information processing systems, 33:6147–6159, 2020.
  38. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  39. Images speak in images: A generalist painter for in-context visual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6830–6839, 2023.
  40. Seggpt: Segmenting everything in context. arXiv preprint arXiv:2304.03284, 2023.
  41. Glass segmentation with multi scales and primary prediction guiding. arXiv preprint arXiv:2402.08571, 2024.
  42. μ𝜇\muitalic_μ-net: Medical image segmentation using efficient and effective deep supervision. Computers in Biology and Medicine, 160:106963, 2023.
  43. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5217–5226, 2019.
  44. Feature-proxy transformer for few-shot segmentation. Advances in neural information processing systems, 35:6575–6588, 2022.
  45. Personalize segment anything model with one shot. In The Twelfth International Conference on Learning Representations, 2023.
  46. Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 129–139. Springer, 2023.
  47. Segment everything everywhere all at once. Advances in Neural Information Processing Systems, 36, 2024.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.