Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM (2404.04996v1)
Abstract: As an important pillar of underwater intelligence, Marine Animal Segmentation (MAS) involves segmenting animals within marine environments. Previous methods don't excel in extracting long-range contextual features and overlook the connectivity between discrete pixels. Recently, Segment Anything Model (SAM) offers a universal framework for general segmentation tasks. Unfortunately, trained with natural images, SAM does not obtain the prior knowledge from marine images. In addition, the single-position prompt of SAM is very insufficient for prior guidance. To address these issues, we propose a novel feature learning framework, named Dual-SAM for high-performance MAS. To this end, we first introduce a dual structure with SAM's paradigm to enhance feature learning of marine images. Then, we propose a Multi-level Coupled Prompt (MCP) strategy to instruct comprehensive underwater prior information, and enhance the multi-level features of SAM's encoder with adapters. Subsequently, we design a Dilated Fusion Attention Module (DFAM) to progressively integrate multi-level features from SAM's encoder. Finally, instead of directly predicting the masks of marine animals, we propose a Criss-Cross Connectivity Prediction (C$3$P) paradigm to capture the inter-connectivity between discrete pixels. With dual decoders, it generates pseudo-labels and achieves mutual supervision for complementary feature representations, resulting in considerable improvements over previous techniques. Extensive experiments verify that our proposed method achieves state-of-the-art performances on five widely-used MAS datasets. The code is available at https://github.com/Drchip61/Dual_SAM.
- Speeded-up robust features (surf). CVIU, 110(3):346–359, 2008.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv, 2021.
- Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. arXiv, 2023a.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI, 40(4):834–848, 2017.
- A robust object segmentation network for underwater scenes. In ICASSP, pages 2629–2633. IEEE, 2022.
- Sam fails to segment anything?–sam-adapter: Adapting sam in underperformed scenes: Camouflage, shadow, and more. arXiv, 2023b.
- A highly efficient model to study the semantics of salient object detection. PAMI, 44(11):8006–8021, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv, 2020.
- Underwater image segmentation in the wild using deep learning. Journal of the Brazilian Computer Society, 27:1–14, 2021.
- Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421, 2018.
- Camouflaged object detection. In CVPR, pages 2777–2787, 2020a.
- Rethinking rgb-d salient object detection: Models, data sets, and large-scale benchmarks. TNNLS, 32(5):2075–2089, 2020b.
- Bbs-net: Rgb-d salient object detection with a bifurcated backbone strategy network. In ECCV, pages 275–292. Springer, 2020c.
- Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In CVPR, pages 3052–3062, 2020.
- Masnet: A robust deep marine animal segmentation network. IEEE Journal of Oceanic Engineering, 2023.
- Desam: Decoupling segment anything model for generalizable medical image segmentation. arXiv, 2023.
- H2former: An efficient hierarchical hybrid transformer for medical image segmentation. TMI, 2023.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Usod10k: a new benchmark dataset for underwater salient object detection. TIP, 2023.
- Parameter-efficient transfer learning for nlp. In ICML, pages 2790–2799. PMLR, 2019.
- Lora: Low-rank adaptation of large language models. arXiv, 2021.
- Densely connected convolutional networks. In CVPR, pages 4700–4708, 2017.
- Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv, 2020a.
- Svam: saliency-guided visual attention modeling by autonomous underwater robots. arXiv, 2020b.
- A model of saliency-based visual attention for rapid scene analysis. PAMI, 20(11):1254–1259, 1998.
- Calibrated rgb-d salient object detection. In CVPR, pages 9471–9481, 2021.
- Let segment anything help image dehaze. arXiv, 2023.
- Connnet: A long-range relation-aware pixel-connectivity network for salient segmentation. TIP, 28(5):2518–2529, 2018.
- Segment anything. arXiv, 2023.
- Detect any deepfakes: Segment anything meets face forgery detection and localization. arXiv, 2023.
- Robust tracking of multiple objects in sector-scan sonar image sequences using optical flow motion estimation. IEEE Journal of Oceanic Engineering, 23(1):31–46, 1998.
- Medlsam: Localize and segment anything model for 3d medical images. arXiv, 2023.
- Hierarchical alternate interaction network for rgb-d salient object detection. TIP, 30:3528–3542, 2021a.
- Mas3k: An open dataset for marine animal segmentation. In International Symposium on Benchmarking, Measuring and Optimization, pages 194–212. Springer, 2020.
- Marine animal segmentation. TCSVT, 32(4):2303–2314, 2021b.
- Feature pyramid networks for object detection. In ICCV, pages 2117–2125, 2017.
- Modeling aleatoric uncertainty for camouflaged object detection. In WACV, pages 1445–1454, 2022.
- A simple pooling-based design for real-time salient object detection. In CVPR, pages 3917–3926, 2019.
- Underwater image saliency detection via attention-based mechanism. In Journal of Physics: Conference Series, page 012012. IOP Publishing, 2022.
- Learning selective mutual attention and contrast for rgb-d saliency detection. TPAMI, 44(12):9026–9042, 2021a.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021b.
- Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. In ACMMM, pages 4481–4490, 2021c.
- Decoupled weight decay regularization. arXiv, 2017.
- Simultaneously localize, segment and rank the camouflaged objects. In CVPR, pages 11591–11601, 2021.
- Pyramidal feature shrinking for salient object detection. In AAAI, pages 2311–2318, 2021.
- Camouflaged object segmentation with distraction mining. In CVPR, pages 8772–8781, 2021.
- Sift: Predicting amino acid changes that affect protein function. NAS, 31(13):3812–3814, 2003.
- Multi-scale interactive network for salient object detection. In CVPR, pages 9413–9422, 2020.
- Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In CVPR, pages 2160–2170, 2022.
- Depth-induced multi-scale recurrent attention network for saliency detection. In ICCV, pages 7254–7263, 2019.
- Mfnet: Multi-filter directive network for weakly supervised salient object detection. In ICCV, pages 4136–4145, 2021.
- Underwater object detection and tracking. In Soft Computing, pages 837–846. Springer, 2020.
- Object detection in underwater acoustic images using edge based segmentation method. Procedia Computer Science, 165:759–765, 2019.
- Basnet: Boundary-aware salient object detection. In CVPR, pages 7479–7489, 2019.
- U2-net: Going deeper with nested u-structure for salient object detection. PR, 106:107404, 2020.
- Rgbd salient object detection via deep fusion. TIP, 26(5):2274–2285, 2017.
- Vision transformers for dense prediction. In ICCV, pages 12179–12188, 2021.
- Robustness of segment anything model (sam) for autonomous driving in adverse weather conditions. arXiv, 2023.
- Automated classification and thematic mapping of bacterial mats in the north sea. In OCEANS, pages 1–8. IEEE, 2013.
- Context-aware cross-level fusion network for camouflaged object detection. arXiv, 2021.
- Progressive feature polishing network for salient object detection. In AAAI, pages 12128–12135, 2020.
- Image quality assessment: from error visibility to structural similarity. IEEE TIP, 13(4):600–612, 2004.
- F33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTnet: fusion, feedback and focus for salient object detection. In AAAI, pages 12321–12328, 2020a.
- Label decoupling framework for salient object detection. In CVPR, pages 13025–13034, 2020b.
- Cascaded partial decoder for fast and accurate salient object detection. In CVPR, pages 3907–3916, 2019a.
- Stacked cross refinement network for edge-aware salient object detection. In ICCV, pages 7264–7273, 2019b.
- Locate globally, segment locally: A progressive architecture with knowledge review network for salient object detection. In AAAI, pages 3004–3012, 2021.
- Aquasam: Underwater image foreground segmentation. arXiv, 2023.
- Fully transformer network for change detection of remote sensing images. In ACCV, pages 1691–1708, 2022.
- Transy-net: Learning fully transformer networks for change detection of remote sensing images. TGRS, 61:1–12, 2023.
- Progressive self-guided loss for salient object detection. TIP, 30:8426–8438, 2021.
- Reversion correction and regularized random walk ranking for saliency detection. TIP, 27(3):1311–1322, 2017.
- Cross-modality discrepant interaction network for rgb-d salient object detection. In ACMMM, pages 2094–2102, 2021a.
- Few-cost salient object detection with adversarial-paced learning. ANIPS, 33:12236–12247, 2020a.
- Sam3d: Zero-shot 3d object detection via segment anything model. arXiv, 2023a.
- Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In CVPR, pages 8582–8591, 2020b.
- Customized segment anything model for medical image segmentation. arXiv, 2023.
- Segment anything model (sam) for radiation oncology. arXiv, 2023b.
- Bts-net: Bi-directional transfer-and-selection network for rgb-d salient object detection. In ICME, pages 1–6. IEEE, 2021b.
- Is depth really necessary for salient object detection? In ACMMM, pages 1745–1754, 2020a.
- Egnet: Edge guidance network for salient object detection. In ICCV, pages 8779–8788, 2019.
- Enlighten-anything: When segment anything model meets low-light image enhancement. arXiv, 2023.
- Pyramid feature attention network for saliency detection. In CVPR, pages 3085–3094, 2019.
- A single stream network for robust and real-time rgb-d salient object detection. In ECCV, pages 646–662. Springer, 2020b.
- Complementary trilateral decoder for fast and accurate salient object detection. In ACMMM, pages 4967–4975, 2021.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR, pages 6881–6890, 2021.
- Specificity-preserving rgb-d saliency detection. In ICCV, pages 4681–4691, 2021.
- Unet++: A nested u-net architecture for medical image segmentation. In MICCAI, pages 3–11. Springer, 2018.