GenCAMO-DB: Large-Scale Camouflage Dataset

Updated 10 January 2026

GenCAMO-DB is a comprehensive dataset defined by 34,200 images with rich multi-modal annotations for camouflage scene analysis.
It integrates RGB, depth maps, scene graphs, structured captions, and generated masks to support tasks like CIG, S2RCDP, COD, and OVCOS.
Benchmark results demonstrate its effectiveness, achieving lower FID/KID scores and improved detection metrics compared to previous datasets.

GenCAMO-DB is a large-scale, publicly released dataset designed to advance research in complex camouflage scene understanding and dense prediction. It provides 34,200 still images annotated with multi-modal information—including RGB frames, depth maps, scene graphs, fine-grained attribute lists, and structured text prompts—to train and benchmark models for tasks such as camouflage image–mask generation (CIG), synthetic-to-real camouflage dense prediction (S2RCDP), camouflage object detection (COD), and open-vocabulary camouflage object segmentation (OVCOS). GenCAMO-DB incorporates both real and synthetic images acquired or generated from twelve open-source collections, under a mask-free pipeline optimized for rich annotation and broad contextual diversity (Chen et al., 3 Jan 2026).

1. Dataset Composition and Annotation Modalities

GenCAMO-DB comprises 34,200 images collected and synthesized from three principal sources: open-domain RGB datasets with semantic graphs (including COCO-Stuff and Visual Genome), camouflage-image benchmarks, and salient/general segmentation data from LAKERED. For CIG and S2RCDP tasks, a dedicated GenCAMO-DB-LAKERED split covers 4,040 training and 12,946 test images, ensuring balanced representation across “concealed,” “salient,” and “general” contexts at a roughly 1:3 ratio.

Each sample is annotated under four dense modalities:

Scene Graphs: Stored in JSON, scene graphs $G=(O, E)$ model object categories ( $o_i \in O$ , e.g., “chameleon”, “leaf”) and relations ( $e_{ij} \in E$ , e.g., “hides behind”, “contacts”). Directed edges are enriched as quintuples $t_{ij} = (a_i, o_i, e_{ij}, o_j, a_j)$ , incorporating source/target concealment attributes. Embeddings include $E^o_{emb}$ (object IDs), $E^a_{emb}$ (attributes), and $E^e_{emb}$ (relations).
Concealment Attributes: Each object is assigned attributes $a_i$ —drawn from a closed vocabulary describing color, pattern, and texture. Top-15 attributes are: green, brown, rough, textured, speckled, mottled, smooth, grey, yellow, granular, striped, rugged, dappled, tarnished, and shiny. These are stored in the scene-graph JSON.
Text Prompts: Each image receives a single GPT-4o–generated caption $C_p$ under a structured template emphasizing subject–verb–object (SVO) syntax, concealment cues, environmental context, and explicit spatial/contact relations. GenCAMO-DB yields approximately 612,500 words of caption text.
Foreground Masks: Although mask-free at generation, a diffusion-based decoder (DiffuMask-style) followed by SAM2 refinement produces approximate segmentation masks as 8-bit PNGs.

All images are provided at 512×512 px for generative/benchmarking purposes, with original resolutions preserved for further annotation.

2. Data Generation and Quality Assurance

Data acquisition leverages a semi-automatic annotation pipeline. Key steps include:

Selection of camouflage-like scenes from open-domain RGB datasets containing scene graphs.
Augmentation of existing camouflage benchmarks with generated depth, scene graphs, and captions.
Extension of SOD and SEG samples from LAKERED with camouflage-style annotations.

Depth maps are predicted via Depth Anything; scene graphs via Universal SG, then verified and refined for camouflage-relevant relations; captions are generated by GPT-4o under a structured template. Each sample undergoes 5–10 minutes of human verification, inspected for cross-modal consistency and camouflage-likelihood. Samples failing modality-alignment are re-annotated.

The GenCAMO generator, based on Stable Diffusion v1.5, ControlNet, and OpenCLIP ViT-H/14, synthesizes camouflage image–annotation triplets. Its generative pipeline incorporates:

Depth–Layout Coherence Guided ControlNet (DLCG): Fuses scene-graph layout and depth features to maintain environment-aware consistency.
Attribute-aware Mask Attention (AMA): Guarantees pixel-wise attention to correct object–attribute pairs.
Unified LDM Decoder: Produces image, depth, and mask channels jointly.

Key objectives include:

Depth–layout coherence loss: $d_i = \min_m (1- S(F_Q(i), p_m))$ , $L_{DLC} = \frac{1}{N} \sum_{i=1}^N d_i$ .
Joint diffusion objective: $L_{LDM} = \mathbb{E}_{z, \epsilon \sim N(0,I), t}[ \Vert \epsilon - \epsilon_\theta(z_t, t, \hat{\tau}^\prime, F_Q) \Vert_2^2 ]$ , with equal weights $\lambda_1 = \lambda_2 = 1$ .

3. File Formats and Access Infrastructure

GenCAMO-DB uses a consistent file organization:

RGB images: 512×512 PNG, stored in /images/.
Depth maps: 16-bit PNG, stored in /depth/.
Scene graphs: JSON lists of nodes and relations, in /scene_graphs/.
Captions: Text files, one sentence each, in /captions/.
Masks: 8-bit PNG, in /masks/.

Each file is indexed by a unique 4- or 5-digit ID (e.g., 00023.png, 00023_depth.png, 00023.json). Dataset access is facilitated via a PyTorch DataLoader and a command-line API, permitting iteration over (image, depth, graph, prompt, mask) tuples.

Users may re-partition the dataset, for example into a conventional 70/10/20 train/val/test split, to suit specific training, validation, and test requirements, including customized balancing of camouflage difficulty.

4. Benchmarking Protocols and Results

GenCAMO-DB supports two principal benchmark families:

Camouflage Image–Mask Generation (CIG): Evaluated by Frechet Inception Distance (FID) and Kernel Inception Distance (KID). On the test split, GenCAMO achieves FID=18.49, KID=0.0025, outperforming LAKERED (FID=64.27, KID=0.0355) and MIP-Adapter (FID=68.26, KID=0.0391).
Synthetic-to-Real Camouflage Dense Prediction (S2RCDP): Encompasses RGB COD, RGB-D COD, and OVCOS.
- RGB/RGB-D COD metrics: MAE↓, S-measure $S_m$ ↑, E-measure $E_m$ ↑, Weighted F-measure $F_\omega^\beta$ ↑. Fine-tuned SINet-v2 + CSRDA on GenCAMO synthetic data achieves $S_m = 0.7874$ , $F_\omega^\beta = 0.6338$ , $E_m = 0.8622$ , $MAE = 0.0431$ , surpassing LAKERED baselines ( $S_m \approx 0.7303$ , $MAE \approx 0.0649$ ).
- OVCOS metrics: $cS_m$ , $cF_\omega^\beta$ , $cMAE$ , $cE_m$ . OV-Camo trained on GenCAMO alone: $cS_m = 0.579$ , $cF_\omega^\beta = 0.490$ , $cMAE = 0.336$ , $cE_m = 0.616$ ; combining real + GenCAMO sets new state of the art ( $cS_m = 0.589$ , $cF_\omega^\beta = 0.518$ , $cMAE = 0.311$ , $cE_m = 0.657$ ).

A plausible implication is that synthetic data from GenCAMO-DB can significantly enhance model generalization in camouflage dense prediction benchmarks.

5. Annotation Schema and Attribute Statistics

GenCAMO-DB’s four annotation modalities are summarized below:

Modality	Description	File Format/Location
RGB Image	512×512 color image	PNG, `/images/`
Depth Map	16-bit predicted depth	PNG, `/depth/`
Scene Graph	Nodes (objects), enriched edges (relations)	JSON, `/scene_graphs/`
Caption	SVO-structured image description (~612,500 words total)	TXT, `/captions/`
Mask	Approximate foreground mask	PNG, `/masks/`

Attribute frequency is led by “green,” “brown,” “rough,” and similar color/texture descriptors. Scene graphs articulate not only object categories and spatial relations but also pairwise concealment attributes, encapsulated in quintuple-form edges. This multi-modal schema is optimized to support environment-aware, contextually rich training and benchmarking.

6. Applications and Prospective Extensions

GenCAMO-DB is suited for fine-grained scene understanding under occlusion, with demonstrable utility in agricultural pest monitoring, industrial defect inspection, ecological biodiversity assessment, and augmented-reality concealment mechanisms. Potential extensions include integration of thermal/multispectral modalities, temporal data (video camouflage), 3D point-cloud annotation, dynamic scene graphs, broader environmental coverage (e.g., underwater, desert scenes), physics-based lighting priors, and human-in-the-loop refinement.

This suggests an evolving utility of GenCAMO-DB as a foundational resource for diverse occlusion-centric tasks and multimodal scene analysis in complex domains (Chen et al., 3 Jan 2026).

PDF Markdown Chat (Pro)

References (1)

GenCAMO: Scene-Graph Contextual Decoupling for Environment-aware and Mask-free Camouflage Image-Dense Annotation Generation (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to GenCAMO-DB.

GenCAMO-DB: Large-Scale Camouflage Dataset

1. Dataset Composition and Annotation Modalities

2. Data Generation and Quality Assurance

3. File Formats and Access Infrastructure

4. Benchmarking Protocols and Results

5. Annotation Schema and Attribute Statistics

6. Applications and Prospective Extensions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

GenCAMO-DB: Large-Scale Camouflage Dataset

1. Dataset Composition and Annotation Modalities

2. Data Generation and Quality Assurance

3. File Formats and Access Infrastructure

4. Benchmarking Protocols and Results

5. Annotation Schema and Attribute Statistics

6. Applications and Prospective Extensions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research