Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts (2404.00741v1)
Abstract: The goal of interactive image segmentation is to delineate specific regions within an image via visual or language prompts. Low-latency and high-quality interactive segmentation with diverse prompts remain challenging for existing specialist and generalist models. Specialist models, with their limited prompts and task-specific designs, experience high latency because the image must be recomputed every time the prompt is updated, due to the joint encoding of image and visual prompts. Generalist models, exemplified by the Segment Anything Model (SAM), have recently excelled in prompt diversity and efficiency, lifting image segmentation to the foundation model era. However, for high-quality segmentations, SAM still lags behind state-of-the-art specialist models despite SAM being trained with x100 more segmentation masks. In this work, we delve deep into the architectural differences between the two types of models. We observe that dense representation and fusion of visual prompts are the key design choices contributing to the high segmentation quality of specialist models. In light of this, we reintroduce this dense design into the generalist models, to facilitate the development of generalist models with high segmentation quality. To densely represent diverse visual prompts, we propose to use a dense map to capture five types: clicks, boxes, polygons, scribbles, and masks. Thus, we propose SegNext, a next-generation interactive segmentation approach offering low latency, high quality, and diverse prompt support. Our method outperforms current state-of-the-art methods on HQSeg-44K and DAVIS, both quantitatively and qualitatively.
- The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv preprint arXiv:2107.02314, 2021.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
- Conditional diffusion for interactive segmentation. In ICCV, pages 7345–7354, 2021.
- Focalclick: Towards practical interactive image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1300–1309, 2022.
- Global contrast based salient region detection. IEEE transactions on pattern analysis and machine intelligence, 37(3):569–582, 2014.
- Daan de Geus and Gijs Dubbelman. Intra-batch supervision for panoptic segmentation on high-resolution images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3165–3173, 2023.
- Phraseclick: toward achieving flexible interactive segmentation by phrase and click. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 417–435. Springer, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Segmented anisotropic sstem dataset of neural tissue. figshare, pages 0–0, 2013.
- Lvis: A dataset for large vocabulary instance segmentation. In CVPR, pages 5356–5364, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Interformer: Real-time interactive image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22301–22311, 2023.
- Referitgame: Referring to objects in photographs of natural scenes. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 787–798, 2014.
- Segment anything in high quality. arXiv preprint arXiv:2306.01567, 2023.
- Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9799–9808, 2020.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023.
- Fss-1000: A 1000-class dataset for few-shot segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2869–2878, 2020.
- Interactive image segmentation with latent diversity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 577–585, 2018.
- Deep interactive thin object selection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 305–314, 2021.
- Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1925–1934, 2017.
- Microsoft coco: Common objects in context. In ECCV, pages 740–755. Springer, 2014.
- Interactive image segmentation with first click attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13339–13348, 2020.
- Focuscut: Diving into a focus view in interactive segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2637–2646, 2022a.
- Knifecut: Refining thin part segmentation with cutting lines. In Proceedings of the 30th ACM International Conference on Multimedia, pages 809–817, 2022b.
- isegformer: interactive segmentation via transformers with application to 3d knee mr images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 464–474. Springer, 2022a.
- Pseudoclick: Interactive image segmentation with click imitation. In European Conference on Computer Vision, pages 728–745. Springer, 2022b.
- Simpleclick: Interactive image segmentation with simple vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22290–22300, 2023.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
- A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 724–732, 2016.
- High quality entity segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4047–4056, 2023.
- Highly accurate dichotomous image segmentation. In European Conference on Computer Vision, pages 38–56. Springer, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- High quality segmentation for ultra high-resolution images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1310–1319, 2022.
- Hierarchical image saliency detection on extended cssd. IEEE transactions on pattern analysis and machine intelligence, 38(4):717–729, 2015.
- Reviving iterative training with mask guidance for interactive segmentation. In 2022 IEEE International Conference on Image Processing (ICIP), pages 3141–3145. IEEE, 2022.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
- Interactive video cutout. ACM Transactions on Graphics (ToG), 24(3):585–594, 2005.
- Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 43(10):3349–3364, 2020.
- Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
- Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3166–3173, 2013.
- Segfix: Model-agnostic boundary refinement for segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pages 489–506. Springer, 2020.
- Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289, 2023.
- Refinemask: Towards high-quality instance segmentation with fine-grained features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6861–6869, 2021.
- Interactive object segmentation with inside-outside guidance. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12234–12244, 2020.
- Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, pages 3–11. Springer, 2018.
- Segment everything everywhere all at once. arXiv preprint arXiv:2304.06718, 2023.
- Qin Liu (84 papers)
- Jaemin Cho (36 papers)
- Mohit Bansal (304 papers)
- Marc Niethammer (80 papers)