MAS-SAM: Segment Any Marine Animal with Aggregated Features (2404.15700v2)
Abstract: Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of the SAM's decoder might lead to the loss of fine-grained object details. To address the above issues, we propose a novel feature learning framework named MAS-SAM for marine animal segmentation, which involves integrating effective adapters into the SAM's encoder and constructing a pyramidal decoder. More specifically, we first build a new SAM's encoder with effective adapters for underwater scenes. Then, we introduce a Hypermap Extraction Module (HEM) to generate multi-scale features for a comprehensive guidance. Finally, we propose a Progressive Prediction Decoder (PPD) to aggregate the multi-scale features and predict the final segmentation results. When grafting with the Fusion Attention Module (FAM), our method enables to extract richer marine information from global contextual cues to fine-grained local details. Extensive experiments on four public MAS datasets demonstrate that our MAS-SAM can obtain better results than other typical segmentation methods. The source code is available at https://github.com/Drchip61/MAS-SAM.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Unifying two-stream encoders with transformers for cross-modal retrieval. arXiv preprint arXiv:2308.04343, 2023.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
- A robust object segmentation network for underwater scenes. In ICASSP, pages 2629–2633. IEEE, 2022.
- Sam fails to segment anything?–sam-adapter: Adapting sam in underperformed scenes: Camouflage, shadow, and more. arXiv preprint arXiv:2304.09148, 2023.
- Bidirectional collaborative mentoring network for marine organism detection and beyond. TCSVT, 2023.
- Recurrent multi-scale transformer for high-resolution salient object detection. In ACM MM, pages 7413–7423, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Underwater image segmentation in the wild using deep learning. Journal of the Brazilian Computer Society, 27:1–14, 2021.
- Structure-measure: A new way to evaluate foreground maps. In ICCV, pages 4548–4557, 2017.
- Camouflaged object detection. In CVPR, pages 2777–2787, 2020.
- Cognitive vision inspired object segmentation metric and loss function. Scientia Sinica Informationis, 6(6), 2021.
- Masnet: A robust deep marine animal segmentation network. IEEE Journal of Oceanic Engineering, 2023.
- 3dsam-adapter: Holistic adaptation of sam from 2d to 3d for promptable medical image segmentation. arXiv preprint arXiv:2306.13465, 2023.
- H2former: An efficient hierarchical hybrid transformer for medical image segmentation. TMI, 2023.
- Usod10k: a new benchmark dataset for underwater salient object detection. TIP, 2023.
- Parameter-efficient transfer learning for nlp. In ICML, pages 2790–2799, 2019.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv preprint arXiv:2002.01155, 2020.
- Let segment anything help image dehaze. arXiv preprint arXiv:2306.15870, 2023.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Detect any deepfakes: Segment anything meets face forgery detection and localization. arXiv preprint arXiv:2306.17075, 2023.
- Mas3k: An open dataset for marine animal segmentation. In International Symposium on Benchmarking, Measuring and Optimization, pages 194–212. Springer, 2020.
- Marine animal segmentation. TCSVT, 32(4):2303–2314, 2021.
- Weakly-supervised salient object detection on light fields. TIP, 31:6295–6305, 2022.
- Visual saliency transformer. In ICCV, pages 4722–4732, 2021.
- Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. In ACM MM, pages 4481–4490, 2021.
- Modeling aleatoric uncertainty for camouflaged object detection. In WACV, pages 1445–1454, 2022.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Simultaneously localize, segment and rank the camouflaged objects. In CVPR, pages 11591–11601, 2021.
- How to evaluate foreground maps? In CVPR, pages 248–255, 2014.
- Camouflaged object segmentation with distraction mining. In CVPR, pages 8772–8781, 2021.
- Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In CVPR, pages 2160–2170, 2022.
- Basnet: Boundary-aware salient object detection. In CVPR, pages 7479–7489, 2019.
- U2-net: Going deeper with nested u-structure for salient object detection. PR, 106:107404, 2020.
- Vision transformers for dense prediction. In ICCV, pages 12179–12188, 2021.
- Context-aware cross-level fusion network for camouflaged object detection. arXiv preprint arXiv:2105.12555, 2021.
- How does bert answer questions? a layer-wise analysis of transformer representations. In ACM CIKM, pages 1823–1832, 2019.
- When sam meets sonar images. arXiv preprint arXiv:2306.14109, 2023.
- I-medsam: Implicit medical image segmentation with segment anything. arXiv preprint arXiv:2311.17081, 2023.
- Stacked cross refinement network for edge-aware salient object detection. In ICCV, pages 7264–7273, 2019.
- Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
- Aquasam: Underwater image foreground segmentation. arXiv preprint arXiv:2308.04218, 2023.
- Fully transformer network for change detection of remote sensing images. In ACCV, pages 1691–1708, 2022.
- Transy-net: Learning fully transformer networks for change detection of remote sensing images. TGRS, 2023.
- Towards high-resolution salient object detection. In ICCV, pages 7234–7243, 2019.
- Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785, 2023.
- Amulet: Aggregating multi-level convolutional features for salient object detection. In ICCV, pages 202–211, 2017.
- Learning uncertain convolutional features for accurate saliency detection. In ICCV, pages 212–221, 2017.
- A bi-directional message passing model for salient object detection. In CVPR, pages 1741–1750, 2018.
- Fast video object segmentation via dynamic targeting network. In ICCV, pages 5582–5591, 2019.
- Capsal: Leveraging captioning to boost semantics for salient object detection. In CVPR, pages 6024–6033, 2019.
- Salient object detection with lossless feature reflection and weighted structural loss. TIP, 28(6):3048–3060, 2019.
- Unsupervised video object segmentation with joint hotspot tracking. In ECCV, pages 490–506, 2020.
- Semantic scene labeling via deep nested level set. TITS, 22(11):6853–6865, 2020.
- Rapnet: Residual atrous pyramid network for importance-aware street scene parsing. TIP, 29:5010–5021, 2020.
- Looking for the detail and context devils: High-resolution salient object detection. TIP, 30:3204–3216, 2021.
- Segment anything model (sam) for radiation oncology. arXiv preprint arXiv:2306.11730, 2023.
- Fantastic animals and where to find them: Segment any marine animal with dual sam. arXiv preprint arXiv:2404.04996, 2024.
- Pyramid feature attention network for saliency detection. In CVPR, pages 3085–3094, 2019.
- Enlighten-anything: When segment anything model meets low-light image enhancement. arXiv preprint arXiv:2306.10286, 2023.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR, pages 6881–6890, 2021.
- Unet++: A nested u-net architecture for medical image segmentation. In MICCAI, pages 3–11. Springer, 2018.
- Tianyu Yan (7 papers)
- Zifu Wan (9 papers)
- Xinhao Deng (8 papers)
- Pingping Zhang (69 papers)
- Yang Liu (2253 papers)
- Huchuan Lu (199 papers)