DeepCamo: Underwater Camouflage Detection

Updated 6 September 2025

DeepCamo is a benchmark for underwater camouflaged object detection, offering 2,493 images with precise 'camouflage-only' segmentation annotations.
It employs strict train-test splits and non-overlapping subsets to facilitate rigorous evaluation and expose the generalization gap in existing models.
SLENet, incorporating multi-scale enhancement and localization guidance, demonstrates state-of-the-art performance on DeepCamo and similar COD benchmarks.

The DeepCamo Dataset is a curated benchmark designed for Underwater Camouflaged Object Detection (UCOD), addressing the acute challenges posed by optical distortions, water turbidity, and the complex camouflage traits of marine organisms. It provides high-quality, carefully annotated samples for evaluating and advancing computer vision algorithms in underwater camouflage scenarios. The dataset underpins a systematic benchmarking effort and supports the development of specialized detection networks, notably SLENet, which utilizes multi-scale enhancement and localization guidance for robust detection performance (Wang et al., 4 Sep 2025).

1. Composition and Annotation Protocols

DeepCamo comprises 2,493 underwater images representing 16 distinct marine species under diverse environmental conditions, such as variant illumination and multi-object scenes. Annotation protocols focus on "camouflage-only" segmentation masks, ensuring that each label corresponds tightly to the camouflaged organism by filtering out ambiguous or noisy information. This approach aims to drive models toward recognizing genuine camouflage, as opposed to background artifacts.

The dataset is partitioned into an 8:2 train-test split: 1,931 images for training and 562 for testing. Additionally, a benchmarking subset, DeepCamo-full (1,907 images), is constructed with stringent non-overlap from popular datasets like COD10K to facilitate rigorous comparison of state-of-the-art methods.

Subset	Images	Species	Annotation Focus
Training	1,931	16	Camouflage-only
Testing	562	16	Camouflage-only
Benchmark (full)	1,907	16	Non-overlapping

2. Benchmarking and Generalization Findings

A comprehensive evaluation of 12 representative camouflaged object detection models (spanning CNN and Transformer architectures) reveals consistent performance degradation—exceeding 15% in key localization metrics (S-measure, weighted F-measure, E-measure)—when deployed on DeepCamo-full, as compared to results on terrestrial camouflage datasets. Even top-performing models such as SAM2-UNet demonstrate significantly diminished accuracy, underscoring the distinct difficulty and generalization gap associated with underwater camouflage.

This empirical observation justifies the necessity for domain-specific approaches and motivates architectural innovations tailored to the underwater context.

3. SLENet: Architectural Overview and Rationale

The Semantic Localization and Enhancement Network (SLENet) was developed in direct response to the limitations exposed by DeepCamo-based benchmarking. SLENet integrates four modules:

SAM2 encoder with adapter mechanisms for multi-scale feature extraction.
Gamma-Asymmetric Enhancement (GAE) module for preserving fine details across scales.
Localization Guidance Branch (LGB) for generating semantically rich location maps.
Multi-Scale Supervised Decoder (MSSD) to refine predictions at multiple resolutions.

Gamma-Asymmetric Enhancement (GAE) Module

The GAE module processes each input feature level, $X_i$ (for $i \in \{1, 2, 3, 4\}$ ), via four sequential branches employing asymmetric and dilated convolutions:

First branch: $D_i^1 = C_{dil}(AMP_{\times 3}(X_i^1))$
Subsequent branches: $C_i^r = C_{asy}(\text{Cat}(X_i^r, \text{Up}(D_i^{r-1}, X_i^r)))$ , $D_i^r = C_{dil}(AMP_{\times(4-r)}(C_i^r))$ , for $r=2,3,4$
Output refinement: $F_i = \gamma (D_i^4 \otimes CA(D_i^4))$

Here, AMP $_{\times N}$ denotes repeated asymmetric convolution-max pooling pairs, $C_{dil}$ is dilated convolution, and $CA$ channel attention. $\gamma$ is a learnable scaling factor.

Localization Guidance Branch (LGB)

The LGB fuses multi-scale features in a bottom-up fashion, enabling global cues to inform finer-scale segmentation. Features $X_i^l$ (compressed to unified dimensions) are integrated recursively:

Initial fusion: $F_2^l = GAE(\text{Down}(X_1^l) \oplus X_2^l)$
Recursive enhancement: $F_i^l = GAE(\text{Down}(F_{i-1}^l) \oplus X_i^l)$ for $i=3,4$
Localization map: $M = CBR_{1 \times 1}(F_4^l)$

Here, $\oplus$ denotes element-wise summation, and $CBR_{1 \times 1}$ combines convolution, batch normalization, and ReLU activation.

4. Performance and Comparative Results

SLENet achieves state-of-the-art results on DeepCamo-Test and demonstrates strong generalization on other COD benchmarks (COD10K, CAMO, CHAMELEON):

DeepCamo metrics: S-measure ( $S_a$ ) of 0.869, mean E-measure ( $E_\phi$ ) of 0.930, weighted F-measure ( $F_\beta^w$ ) of 0.800, and MAE of 0.022.
SLENet outperforms previous best methods, such as SAM2-UNet, with notable gains in localization accuracy and object boundary preservation.
Consistency across datasets indicates the effectiveness of GAE and LGB for both underwater and generic camouflage detection.

5. Significance in Underwater Object Detection and Broader COD Research

DeepCamo represents a pivotal resource for advancing algorithms in the underexplored domain of UCOD. Its refined annotations, strict splits, and challenging content reveal fundamental limitations in general-purpose camouflage detection models, catalyzing innovations like SLENet.

The dataset's impact extends beyond underwater contexts. The successful translation of methods developed on DeepCamo to other COD benchmarks demonstrates the relevance of multi-scale enhancement and guided localization principles for object detection tasks where subtle visual cues predominate.

A plausible implication is that finer-grained annotation and emphasis on true camouflage, rather than mere background segmentation, are necessary for achieving robust generalization in COD models. This suggests future work should further integrate semantically guided multi-scale fusion and explicitly model the unique traits of diverse camouflage environments.

6. Applications and Future Research Directions

The DeepCamo Dataset and its associated benchmark findings have immediate utility in marine ecological monitoring, automated underwater exploration, and robust vision-based systems for biodiversity and surveillance tasks. By exposing the generalization gap, DeepCamo motivates the development of architectures sensitive to environmental context, optical distortion, and biologically inspired camouflage mechanisms.

Future research will likely focus on:

Integrating depth estimation and multimodal fusion, as exemplified by related RGB-D COD approaches (Liua et al., 9 May 2024).
Extending explainable detection frameworks leveraging auxiliary attention and localization maps (Lv et al., 2022).
Expanding annotated resources for open-vocabulary, multimodal COD tasks (Pang et al., 2023, Ruan et al., 24 Sep 2024).

The DeepCamo Dataset thus constitutes a foundational benchmark for the ongoing evolution of camouflaged object detection research, with demonstrated impact on both task performance and methodological innovation across computer vision subfields.