Semantic-aware SAM for Point-Prompted Instance Segmentation (2312.15895v2)
Abstract: Single-point annotation in visual tasks, with the goal of minimizing labelling costs, is becoming increasingly prominent in research. Recently, visual foundation models, such as Segment Anything (SAM), have gained widespread usage due to their robust zero-shot capabilities and exceptional annotation performance. However, SAM's class-agnostic output and high confidence in local segmentation introduce 'semantic ambiguity', posing a challenge for precise category-specific segmentation. In this paper, we introduce a cost-effective category-specific segmenter using SAM. To tackle this challenge, we have devised a Semantic-Aware Instance Segmentation Network (SAPNet) that integrates Multiple Instance Learning (MIL) with matching capability and SAM with point prompts. SAPNet strategically selects the most representative mask proposals generated by SAM to supervise segmentation, with a specific focus on object category information. Moreover, we introduce the Point Distance Guidance and Box Mining Strategy to mitigate inherent challenges: 'group' and 'local' issues in weakly supervised segmentation. These strategies serve to further enhance the overall segmentation performance. The experimental results on Pascal VOC and COCO demonstrate the promising performance of our proposed SAPNet, emphasizing its semantic matching capabilities and its potential to advance point-prompted instance segmentation. The code will be made publicly available.
- Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2209–2218, 2019.
- Multiscale combinatorial grouping. In CVPR, 2014.
- Weakly supervised instance segmentation by learning annotation consistent instances. In European Conference on Computer Vision, pages 254–270. Springer, 2020.
- Weakly supervised deep detection networks. In CVPR, 2016.
- Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9157–9166, 2019.
- Léon Bottou. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade: Second Edition, pages 421–436. Springer, 2012.
- Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. arXiv preprint arXiv:2306.16269, 2023a.
- Point-to-box network for accurate object detection via single point supervision. In European Conference on Computer Vision, pages 51–67. Springer, 2022.
- Segment anything model (sam) enhanced pseudo labels for weakly supervised semantic segmentation. arXiv preprint arXiv:2305.05803, 2023b.
- Masked-attention mask transformer for universal image segmentation. In CVPR, 2022a.
- Pointly-supervised instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2617–2626, 2022b.
- The pascal visual object classes (VOC) challenge. IJCV, 2010.
- Pointly-supervised panoptic segmentation. In European Conference on Computer Vision, pages 319–336. Springer, 2022.
- Deep residual learning for image recognition. In CVPR, 2016.
- Mask R-CNN. In ICCV, 2017.
- Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324, 2023.
- Weakly supervised instance segmentation using the bounding box tightness prior. In NeurIPS, 2019.
- Comprehensive attention self-distillation for weakly-supervised object detection. Advances in neural information processing systems, 33:16797–16807, 2020.
- Segment anything is a good pseudo-label generator for weakly supervised semantic segmentation. arXiv preprint arXiv:2305.01275, 2023.
- Segment anything in high quality. arXiv preprint arXiv:2306.01567, 2023.
- Beyond semantic to instance segmentation: Weakly-supervised instance segmentation via semantic knowledge transfer and self-refinement. In CVPR, 2022.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023a.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023b.
- Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3406–3416, 2021.
- Proposal-based instance segmentation with point supervision. In ICIP, 2020.
- Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2643–2652, 2021.
- Semantic-sam: Segment and recognize anything at any granularity. arXiv preprint arXiv:2307.04767, 2023a.
- Box-supervised instance segmentation with level set evolution. In European conference on computer vision, pages 1–18. Springer, 2022.
- Point2mask: Point-supervised panoptic segmentation via optimal transport. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 572–581, 2023b.
- Attentionshift: Iteratively estimated part-based attention map for pointly supervised instance segmentation. In CVPR, 2023.
- Feature pyramid networks for object detection. In CVPR, 2017a.
- Focal loss for dense object detection. In ICCV, 2017b.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016.
- Ufo 2: A unified framework towards omni-supervised object detection. In European conference on computer vision, pages 288–313. Springer, 2020.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
- Object discovery via contrastive learning for weakly supervised object detection. In European Conference on Computer Vision, pages 312–329. Springer, 2022.
- Multiple instance detection network with online instance classifier refinement. In CVPR, 2017.
- PCL: proposal cluster learning for weakly supervised object detection. IEEE TPAMI, 2020.
- Boxinst: High-performance instance segmentation with box annotations. In CVPR, 2021.
- Instance and panoptic segmentation using conditional convolutions. IEEE TPAMI, 2023.
- SOLO: segmenting objects by locations. In ECCV, 2020a.
- Solov2: Dynamic and fast instance segmentation. Proc. Advances in Neural Information Processing Systems (NeurIPS), 2020b.
- Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023.
- Fast segment anything. arXiv preprint arXiv:2306.12156, 2023.
- Irnet: Instance relation network for overlapping cervical cell segmentation. In MICCAI, 2019.