DeepCamo: Underwater Object Detection Benchmark
- DeepCamo is a benchmark dataset that defines underwater camouflaged object detection with a comprehensive collection of marine images and precise manual annotations.
- The dataset integrates images from shallow to deep-water habitats using advanced augmentation and stratified train/test splits, ensuring practical evaluation protocols.
- Evaluation on DeepCamo reveals that state-of-the-art models face significant performance drops in underwater scenarios, highlighting unique challenges in optical distortion and turbulence.
Underwater Camouflaged Object Detection (UCOD) requires the identification of marine organisms that visually blend into complex aquatic environments. DeepCamo is a benchmark dataset specifically assembled to address the limitations inherent in previous camouflaged object detection (COD) datasets, focusing on the nuanced challenges present in underwater scenes such as severe optical distortion, high turbidity, and morphological complexity of marine life. DeepCamo establishes a standardized testbed for UCOD with rigorous annotation and stratification protocols and has become integral to evaluating both convolutional and transformer-based state-of-the-art object detection models (Wang et al., 4 Sep 2025).
1. Dataset Construction and Composition
DeepCamo is derived from six publicly available datasets: MAS3K, RMAS, UFO120 (underwater scenes), and CAMO, COD10K, CHAMELEON (general COD datasets). The dataset compiles images from shallow (1 m) to deep-water (up to 30 m) habitats under varying conditions—sunlit and shadowed illumination, clear to murky water, chromatic aberration, haze, motion and defocus blur, vegetative and rocky occlusion, and groupings of multiple objects.
Source images are acquired via Remotely Operated Vehicles (ROVs) and handheld underwater camera housings, with native resolutions ranging from 640×480 to 4000×3000 (1 MP to 12 MP). The total dataset comprises 2,493 images containing approximately 4,200 annotated object instances (averaging 1.7 objects per image) across 16 marine species, including octopus, cuttlefish, flatfish, scorpionfish, stonefish, seahorse, pipefish, and sea urchin.
| Species | Images | Instances |
|---|---|---|
| Octopus | 320 | 380 |
| Cuttlefish | 290 | 310 |
| Flatfish | 250 | 280 |
| Scorpionfish/Stonefish | 300 | 340 |
| Seahorse/Pipefish | 200 | 220 |
| Sea urchin (+ invertebrates) | 1133 | ~2700 |
Annotation utilizes manual polygonal segmentation in LabelMe by two graduate students, with a marine biology expert adjudicating boundary disagreements. Inter-annotator consistency exceeds 0.95 IoU on a 5% image subset. Images are standardized to 352×352 via bilinear interpolation during network training; original formats are JPEG/PNG. Data augmentation includes random horizontal/vertical flips (p=0.5), rotation (±15°), color jitter (brightness ±20%, contrast ±15%, saturation ±10%), and Gaussian blur (kernel 3×3, σ∈[0.1,1.0], p=0.3).
Train/test splits are stratified by species and object count: DeepCamo-train (1,931 images, 77.4%) and DeepCamo-test (562 images, 22.6%). DeepCamo-full is an unbiased 1,907-image subset strictly excluding any image overlapping COD10K training data.
2. Underwater Camouflaging Phenomena and Data Characteristics
DeepCamo encapsulates the critical traits of underwater camouflage. Scenes present intricate background textures (coral, rocks, kelp, sand ripples) frequently matching object appearance, and heavy color distortion from green/blue wavelength-dependent absorption. Water turbidity induces haze and local blur, while occlusion from vegetation or rocks and multi-object groupings complicate object boundaries.
Objects are often diminutive—40% occupy less than 3% of the image area (small), 45% labeled medium (3%–10%), and 15% large (>10%). Aspect ratio statistics show a mean of 1.2 (SD 0.45), spanning nearly square to elongated (up to 3:1). Instances are highly morphologically variable, e.g., octopus arms and urchin spines, and overlapping silhouettes from adjacent individuals. Representative cases include a 30×30 px cuttlefish in murky water (small), a 150×80 px flatfish on sand (medium), and >300×200 px stonefish camouflaged amid coral (large).
3. Evaluation Metrics and Protocols
DeepCamo employs standardized metrics consistent with the COD benchmarking suite:
- Precision (), Recall (): evaluate detection accuracy.
- Weighted F‐measure (, ):
Emphasizes precision.
- Mean Absolute Error (MAE):
- S‐measure (): Combines object-aware () and region-aware () similarity, [Fan et al., CVPR 2020].
- E‐measure (): Aggregates local pixel matching and global statistics [Fan et al., TIP 2018].
Evaluation requires resizing predicted probability maps to ground-truth resolution, computing metrics at 201 binarization thresholds (, step 0.005); is the maximal across all thresholds. MAE, , and are computed on continuous maps without binarization. No UCOD-specific metrics are introduced beyond the COD baseline.
4. Baseline Model Performance and Dataset Challenge
Twelve leading COD models—including CNNs (SINet, SINet-V2, PFNet, PreyNet, BCNet, FEDER) and transformer architectures (PUENet, FSNet, HitNet, Dual-SAM, MAMIFNet, SAM2-UNet)—were benchmarked on DeepCamo-full. Results indicate pronounced performance degradation in underwater scenarios versus terrestrial datasets, with average reduction exceeding 15%, and MAE values nearly doubled for some models. The most significant failures arise in scenes exhibiting heavy turbidity, high object density, and strong color casts—models frequently omit small objects or produce fragmented masks.
| Model | S_α | E_ϕ | F_βw | MAE |
|---|---|---|---|---|
| SINet | 0.665 | 0.736 | 0.417 | 0.051 |
| PFNet | 0.674 | 0.756 | 0.428 | 0.049 |
| PreyNet | 0.677 | 0.738 | 0.444 | 0.058 |
| FEDER | 0.693 | 0.772 | 0.474 | 0.049 |
| PUENet | 0.728 | 0.802 | 0.551 | 0.047 |
| SAM2-UNet | 0.741 | 0.801 | 0.557 | 0.045 |
5. Implementation Guidelines and Recommendations
For optimizing UCOD methods with DeepCamo, several best practices are recommended:
- Augmentation: Employ color-cast correction (white-balancing), simulated haze, and elastic deformation to replicate water flow and distortions.
- Pretraining: Fine-tune natural-image COD models using Adapter modules on DeepCamo to acclimate to underwater color statistics.
- Training settings: Image size 352×352, batch size 16, AdamW optimizer (initial LR=) with cosine decay. Global localization branch loss balancing uses a dynamic weight (μ=0.6, decaying to 0.1 over epochs).
A plausible implication is that employing underwater-specific augmentation and fine-tuning approaches may mitigate some of the documented performance drops under UCOD constraints.
6. Extensions and Open Research Problems
Potential evolutions of DeepCamo-centered UCOD research include:
- Underwater image enhancement as pre-processing to address distortion and turbidity.
- Cross-domain adaptation from terrestrial COD with adversarial or self-supervised learning objectives.
- Advancing from binary segmentation to instance-level detection for species classification and enumeration.
- Real-time UCOD for ROV inspection via model compression and acceleration.
DeepCamo provides a rigorous evaluation benchmark for advancing object detection in challenging marine contexts. By systematically integrating diverse environmental artifacts and diligently curated object annotations, it promotes the development and assessment of robust, generalizable UCOD methodologies for practical marine ecological monitoring (Wang et al., 4 Sep 2025).