Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects (2312.07374v3)
Abstract: Camouflaged object detection (COD) approaches heavily rely on pixel-level annotated datasets. Weakly-supervised COD (WSCOD) approaches use sparse annotations like scribbles or points to reduce annotation effort, but this can lead to decreased accuracy. The Segment Anything Model (SAM) shows remarkable segmentation ability with sparse prompts like points. However, manual prompt is not always feasible, as it may not be accessible in real-world application. Additionally, it only provides localization information instead of semantic one, which can intrinsically cause ambiguity in interpreting the targets. In this work, we aim to eliminate the need for manual prompt. The key idea is to employ Cross-modal Chains of Thought Prompting (CCTP) to reason visual prompts using the semantic information given by a generic text prompt. To that end, we introduce a test-time adaptation per-instance mechanism called Generalizable SAM (GenSAM) to automatically enerate and optimize visual prompts the generic task prompt for WSCOD. In particular, CCTP maps a single generic text prompt onto image-specific consensus foreground and background heatmaps using vision-LLMs, acquiring reliable visual prompts. Moreover, to test-time adapt the visual prompts, we further propose Progressive Mask Generation (PMG) to iteratively reweight the input image, guiding the model to focus on the targets in a coarse-to-fine manner. Crucially, all network parameters are fixed, avoiding the need for additional training. Experiments demonstrate the superiority of GenSAM. Experiments on three benchmarks demonstrate that GenSAM outperforms point supervision approaches and achieves comparable results to scribble supervision ones, solely relying on general task descriptions as prompts. our codes is in: https://lwpyh.github.io/GenSAM/.
- On training sample memorization: Lessons from benchmarking generative modeling with a large-scale competition. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2534–2542.
- The ability of Segmenting Anything Model (SAM) to segment ultrasound images. BioScience Trends.
- SAM Fails to Segment Anything?–SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, and More. arXiv preprint arXiv:2304.09148.
- Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision, 4548–4557.
- Concealed object detection. IEEE transactions on pattern analysis and machine intelligence, 44(10): 6024–6042.
- Cognitive vision inspired object segmentation metric and loss function. Scientia Sinica Informationis, 6(6).
- Camouflaged object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2777–2787.
- Pranet: Parallel reverse attention network for polyp segmentation. In International conference on medical image computing and computer-assisted intervention, 263–273. Springer.
- Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping. arXiv preprint arXiv:2305.11003.
- Weakly-supervised camouflaged object detection with scribble annotations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 781–789.
- Detection of the mobile object with camouflage color under dynamic background based on optical flow. Procedia Engineering, 15: 2201–2205.
- Multi-Weight Partial Domain Adaptation. In BMVC, 5.
- Discriminative partial domain adversarial network. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16, 632–648. Springer.
- Learning Unbiased Transferability for Domain Adaptation by Uncertainty Modeling. In European Conference on Computer Vision, 223–241. Springer.
- Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of physiology, 160(1): 106.
- SAM Struggles in Concealed Scenes–Empirical Study on” Segment Anything”. arXiv preprint arXiv:2304.06022.
- Segment anything is not always perfect: An investigation of sam on different real-world applications. arXiv preprint arXiv:2304.05750.
- Segment anything. arXiv preprint arXiv:2304.02643.
- Anabranch network for camouflaged object segmentation. Computer vision and image understanding, 184: 45–56.
- Clip surgery for better explainability with enhancement in open-vocabulary tasks. arXiv preprint arXiv:2304.05653.
- Generated knowledge prompting for commonsense reasoning. arXiv preprint arXiv:2110.08387.
- How to evaluate foreground maps? In Proceedings of the IEEE conference on computer vision and pattern recognition, 248–255.
- The norm must go on: Dynamic unsupervised domain adaptation by normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14765–14775.
- Efficient test-time model adaptation without forgetting. In International conference on machine learning, 16888–16905. PMLR.
- Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2160–2170.
- Early evolution and ecology of camouflage in insects. Proceedings of the National Academy of Sciences, 109(52): 21414–21419.
- Pike, T. W. 2018. Quantifying camouflage and conspicuousness using visual salience. Methods in Ecology and Evolution, 9(8): 1883–1895.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
- Performance of decamouflaging through exploratory image analysis. In 2008 First International Conference on Emerging Trends in Engineering and Technology, 6–10. IEEE.
- Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. International journal of computer assisted radiology and surgery, 9: 283–293.
- Animal camouflage analysis: Chameleon database. Unpublished manuscript, 2(6): 7.
- Can sam segment anything? when sam meets camouflaged object detection. arXiv preprint arXiv:2304.04709.
- Large-scale training of shadow detectors with noisily-annotated shadow examples. In ECCV, 816–832. Springer.
- Tent: Fully test-time adaptation by entropy minimization. arXiv preprint arXiv:2006.10726.
- Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 24090–24099.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 24824–24837.
- Structure-consistent weakly supervised salient object detection with local saliency coherence. In Proceedings of the AAAI conference on artificial intelligence, volume 35, 3234–3242.
- Weakly-supervised salient object detection via scribble annotations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12546–12555.
- Chatgpt asks, blip-2 answers: Automatic questioning towards enriched visual descriptions. arXiv preprint arXiv:2303.06594.
- Segment everything everywhere all at once. arXiv preprint arXiv:2304.06718.