Papers
Topics
Authors
Recent
Search
2000 character limit reached

GenCAMO-DB: Large-Scale Camouflage Dataset

Updated 10 January 2026
  • GenCAMO-DB is a comprehensive dataset defined by 34,200 images with rich multi-modal annotations for camouflage scene analysis.
  • It integrates RGB, depth maps, scene graphs, structured captions, and generated masks to support tasks like CIG, S2RCDP, COD, and OVCOS.
  • Benchmark results demonstrate its effectiveness, achieving lower FID/KID scores and improved detection metrics compared to previous datasets.

GenCAMO-DB is a large-scale, publicly released dataset designed to advance research in complex camouflage scene understanding and dense prediction. It provides 34,200 still images annotated with multi-modal information—including RGB frames, depth maps, scene graphs, fine-grained attribute lists, and structured text prompts—to train and benchmark models for tasks such as camouflage image–mask generation (CIG), synthetic-to-real camouflage dense prediction (S2RCDP), camouflage object detection (COD), and open-vocabulary camouflage object segmentation (OVCOS). GenCAMO-DB incorporates both real and synthetic images acquired or generated from twelve open-source collections, under a mask-free pipeline optimized for rich annotation and broad contextual diversity (Chen et al., 3 Jan 2026).

1. Dataset Composition and Annotation Modalities

GenCAMO-DB comprises 34,200 images collected and synthesized from three principal sources: open-domain RGB datasets with semantic graphs (including COCO-Stuff and Visual Genome), camouflage-image benchmarks, and salient/general segmentation data from LAKERED. For CIG and S2RCDP tasks, a dedicated GenCAMO-DB-LAKERED split covers 4,040 training and 12,946 test images, ensuring balanced representation across “concealed,” “salient,” and “general” contexts at a roughly 1:3 ratio.

Each sample is annotated under four dense modalities:

  • Scene Graphs: Stored in JSON, scene graphs G=(O,E)G=(O, E) model object categories (oiOo_i \in O, e.g., “chameleon”, “leaf”) and relations (eijEe_{ij} \in E, e.g., “hides behind”, “contacts”). Directed edges are enriched as quintuples tij=(ai,oi,eij,oj,aj)t_{ij} = (a_i, o_i, e_{ij}, o_j, a_j), incorporating source/target concealment attributes. Embeddings include EemboE^o_{emb} (object IDs), EembaE^a_{emb} (attributes), and EembeE^e_{emb} (relations).
  • Concealment Attributes: Each object is assigned attributes aia_i—drawn from a closed vocabulary describing color, pattern, and texture. Top-15 attributes are: green, brown, rough, textured, speckled, mottled, smooth, grey, yellow, granular, striped, rugged, dappled, tarnished, and shiny. These are stored in the scene-graph JSON.
  • Text Prompts: Each image receives a single GPT-4o–generated caption CpC_p under a structured template emphasizing subject–verb–object (SVO) syntax, concealment cues, environmental context, and explicit spatial/contact relations. GenCAMO-DB yields approximately 612,500 words of caption text.
  • Foreground Masks: Although mask-free at generation, a diffusion-based decoder (DiffuMask-style) followed by SAM2 refinement produces approximate segmentation masks as 8-bit PNGs.

All images are provided at 512×512 px for generative/benchmarking purposes, with original resolutions preserved for further annotation.

2. Data Generation and Quality Assurance

Data acquisition leverages a semi-automatic annotation pipeline. Key steps include:

  • Selection of camouflage-like scenes from open-domain RGB datasets containing scene graphs.
  • Augmentation of existing camouflage benchmarks with generated depth, scene graphs, and captions.
  • Extension of SOD and SEG samples from LAKERED with camouflage-style annotations.

Depth maps are predicted via Depth Anything; scene graphs via Universal SG, then verified and refined for camouflage-relevant relations; captions are generated by GPT-4o under a structured template. Each sample undergoes 5–10 minutes of human verification, inspected for cross-modal consistency and camouflage-likelihood. Samples failing modality-alignment are re-annotated.

The GenCAMO generator, based on Stable Diffusion v1.5, ControlNet, and OpenCLIP ViT-H/14, synthesizes camouflage image–annotation triplets. Its generative pipeline incorporates:

  • Depth–Layout Coherence Guided ControlNet (DLCG): Fuses scene-graph layout and depth features to maintain environment-aware consistency.
  • Attribute-aware Mask Attention (AMA): Guarantees pixel-wise attention to correct object–attribute pairs.
  • Unified LDM Decoder: Produces image, depth, and mask channels jointly.

Key objectives include:

  • Depth–layout coherence loss: di=minm(1S(FQ(i),pm))d_i = \min_m (1- S(F_Q(i), p_m)), oiOo_i \in O0.
  • Joint diffusion objective: oiOo_i \in O1, with equal weights oiOo_i \in O2.

3. File Formats and Access Infrastructure

GenCAMO-DB uses a consistent file organization:

  • RGB images: 512×512 PNG, stored in /images/.
  • Depth maps: 16-bit PNG, stored in /depth/.
  • Scene graphs: JSON lists of nodes and relations, in /scene_graphs/.
  • Captions: Text files, one sentence each, in /captions/.
  • Masks: 8-bit PNG, in /masks/.

Each file is indexed by a unique 4- or 5-digit ID (e.g., 00023.png, 00023_depth.png, 00023.json). Dataset access is facilitated via a PyTorch DataLoader and a command-line API, permitting iteration over (image, depth, graph, prompt, mask) tuples.

Users may re-partition the dataset, for example into a conventional 70/10/20 train/val/test split, to suit specific training, validation, and test requirements, including customized balancing of camouflage difficulty.

4. Benchmarking Protocols and Results

GenCAMO-DB supports two principal benchmark families:

  • Camouflage Image–Mask Generation (CIG): Evaluated by Frechet Inception Distance (FID) and Kernel Inception Distance (KID). On the test split, GenCAMO achieves FID=18.49, KID=0.0025, outperforming LAKERED (FID=64.27, KID=0.0355) and MIP-Adapter (FID=68.26, KID=0.0391).
  • Synthetic-to-Real Camouflage Dense Prediction (S2RCDP): Encompasses RGB COD, RGB-D COD, and OVCOS.
    • RGB/RGB-D COD metrics: MAE↓, S-measure oiOo_i \in O3↑, E-measure oiOo_i \in O4↑, Weighted F-measure oiOo_i \in O5↑. Fine-tuned SINet-v2 + CSRDA on GenCAMO synthetic data achieves oiOo_i \in O6, oiOo_i \in O7, oiOo_i \in O8, oiOo_i \in O9, surpassing LAKERED baselines (eijEe_{ij} \in E0, eijEe_{ij} \in E1).
    • OVCOS metrics: eijEe_{ij} \in E2, eijEe_{ij} \in E3, eijEe_{ij} \in E4, eijEe_{ij} \in E5. OV-Camo trained on GenCAMO alone: eijEe_{ij} \in E6, eijEe_{ij} \in E7, eijEe_{ij} \in E8, eijEe_{ij} \in E9; combining real + GenCAMO sets new state of the art (tij=(ai,oi,eij,oj,aj)t_{ij} = (a_i, o_i, e_{ij}, o_j, a_j)0, tij=(ai,oi,eij,oj,aj)t_{ij} = (a_i, o_i, e_{ij}, o_j, a_j)1, tij=(ai,oi,eij,oj,aj)t_{ij} = (a_i, o_i, e_{ij}, o_j, a_j)2, tij=(ai,oi,eij,oj,aj)t_{ij} = (a_i, o_i, e_{ij}, o_j, a_j)3).

A plausible implication is that synthetic data from GenCAMO-DB can significantly enhance model generalization in camouflage dense prediction benchmarks.

5. Annotation Schema and Attribute Statistics

GenCAMO-DB’s four annotation modalities are summarized below:

Modality Description File Format/Location
RGB Image 512×512 color image PNG, /images/
Depth Map 16-bit predicted depth PNG, /depth/
Scene Graph Nodes (objects), enriched edges (relations) JSON, /scene_graphs/
Caption SVO-structured image description (~612,500 words total) TXT, /captions/
Mask Approximate foreground mask PNG, /masks/

Attribute frequency is led by “green,” “brown,” “rough,” and similar color/texture descriptors. Scene graphs articulate not only object categories and spatial relations but also pairwise concealment attributes, encapsulated in quintuple-form edges. This multi-modal schema is optimized to support environment-aware, contextually rich training and benchmarking.

6. Applications and Prospective Extensions

GenCAMO-DB is suited for fine-grained scene understanding under occlusion, with demonstrable utility in agricultural pest monitoring, industrial defect inspection, ecological biodiversity assessment, and augmented-reality concealment mechanisms. Potential extensions include integration of thermal/multispectral modalities, temporal data (video camouflage), 3D point-cloud annotation, dynamic scene graphs, broader environmental coverage (e.g., underwater, desert scenes), physics-based lighting priors, and human-in-the-loop refinement.

This suggests an evolving utility of GenCAMO-DB as a foundational resource for diverse occlusion-centric tasks and multimodal scene analysis in complex domains (Chen et al., 3 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GenCAMO-DB.