LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion (2404.00292v4)
Abstract: Camouflaged vision perception is an important vision task with numerous practical applications. Due to the expensive collection and labeling costs, this community struggles with a major bottleneck that the species category of its datasets is limited to a small number of object species. However, the existing camouflaged generation methods require specifying the background manually, thus failing to extend the camouflaged sample diversity in a low-cost manner. In this paper, we propose a Latent Background Knowledge Retrieval-Augmented Diffusion (LAKE-RED) for camouflaged image generation. To our knowledge, our contributions mainly include: (1) For the first time, we propose a camouflaged generation paradigm that does not need to receive any background inputs. (2) Our LAKE-RED is the first knowledge retrieval-augmented method with interpretability for camouflaged generation, in which we propose an idea that knowledge retrieval and reasoning enhancement are separated explicitly, to alleviate the task-specific challenges. Moreover, our method is not restricted to specific foreground targets or backgrounds, offering a potential for extending camouflaged vision perception to more diverse domains. (3) Experimental results demonstrate that our method outperforms the existing approaches, generating more realistic camouflage images.
- Slic superpixels compared to state-of-the-art superpixel methods. TPAMI, 34(11):2274–2282, 2012.
- Demystifying mmd gans. In ICLR, 2018.
- autotrack: A lightweight object detection and tracking system for the sae autodrive challenge. In CRV, 2019.
- A naturalistic open source movie for optical flow evaluation. In ECCV, 2012.
- Coco-stuff: Thing and stuff classes in context. In CVPR, 2018.
- Confidence-weighted mutual supervision on dual networks for unsupervised cross-modality image segmentation. SCIS, 66(11):210104, 2023.
- Camouflage images. TOG, 29(4):51–1, 2010.
- Generative adversarial networks: An overview. SPM, 35(1):53–65, 2018.
- IC Cuthill. Camouflage. J ZOOL, 308(2):75–92, 2019.
- Poisson Image Editing. IPOL, 6:300–325, 2016.
- Vision-based pest detection based on svm classification method. COMPAG, 137:52–58, 2017.
- Camouflaged object detection. In CVPR, 2020.
- Concealed object detection. TPAMI, 44(10):6024–6042, 2021.
- Advances in deep concealed scene understanding. VI, 1(1):16, 2023.
- Dall-e for detection: Language-driven context image synthesis for object detection. arXiv preprint arXiv:2206.09592, 2022.
- Is synthetic data from generative models ready for image recognition? In ICLR, 2023.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Annotation-efficient polyp segmentation via active learning. arXiv preprint arXiv:2403.14350, 2024a.
- Alignsam: Aligning segment anything model to open context via reinforcement learning. In CVPR, 2024b.
- Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, 2017.
- Rethinking polyp segmentation from an out-of-distribution perspective. MIR, pages 1–9, 2024.
- S 2-ver: Semi-supervised visual emotion recognition. In ECCV, 2022.
- Meta-sim: Learning to generate synthetic datasets. In ICCV, 2019.
- The making and breaking of camouflage. In ICCV, 2023.
- Anabranch network for camouflaged object segmentation. CVIU, 184:45–56, 2019.
- Bigdatasetgan: Synthesizing imagenet with pixel-wise annotations. In CVPR, 2022.
- Learning background prompts to discover implicit knowledge for open vocabulary object detection. In CVPR, 2024.
- Location-free camouflage generation network. TMM, 25:5234–5247, 2023a.
- Open-vocabulary object segmentation with diffusion models. In ICCV, 2023b.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Active self-training for weakly supervised 3d scene semantic segmentation. CVMJ, pages 1–14, 2024.
- Progressive neighbor consistency mining for correspondence pruning. In CVPR, 2023.
- A meaningful learning method for zero-shot semantic segmentation. SCIS, 66(11):210103, 2023a.
- Pgfnet: Preference-guided filtering network for two-view correspondence learning. TIP, 32:1367–1378, 2023b.
- Repaint: Inpainting using denoising diffusion probabilistic models. In CVPR, 2022.
- Camdiff: Camouflage image augmentation via diffusion. AIR, 2:9150021, 2023.
- Simultaneously localize, segment and rank the camouflaged objects. In CVPR, 2021.
- How camouflage works. Philos T R Soc B, 372(1724):20160341, 2017.
- A survey of synthetic data augmentation methods in machine vision. MIR, pages 1–39, 2024.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Animal camouflage: current issues and new perspectives. Philos T R Soc B, 364(1516):423–427, 2009.
- Neural discrete representation learning. In NeurIPS, 2017.
- Multi-task learning and joint refinement between camera localization and object detection. CVMJ, pages 1–19, 2024.
- Learning to detect salient objects with image-level supervision. In CVPR, 2017.
- Ease: Robust facial expression recognition via emotion ambiguity-sensitive cooperative networks. In ACM MM, 2022.
- Dip: Dual incongruity perceiving network for sarcasm detection. In CVPR, 2023.
- Synscapes: A photorealistic synthetic dataset for street scene parsing. arXiv preprint arXiv:1810.08705, 2018.
- Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models. In ICCV, 2023.
- Saliency detection via graph-based manifold ranking. In CVPR, 2013.
- A full-set tooth segmentation model based on improved pointnet++. VI, 1(1):21, 2023.
- Looking into gait for perceiving emotions via bilateral posture and movement graph convolutional networks. TAFFC, 2024.
- Deep camouflage images. In AAAI, 2020a.
- Sg-one: Similarity guidance network for one-shot semantic segmentation. TCYB, 50(9):3855–3865, 2020b.
- Datasetgan: Efficient labeled data factory with minimal human effort. In CVPR, 2021.
- Temporal sentiment localization: Listen and look in untrimmed videos. In ACM MM, 2022.
- Planeseg: Building a plug-in for boosting planar region segmentation. TNNLS, pages 1–15, 2023a.
- Multiple planar object tracking. In ICCV, 2023b.
- Weakly supervised video emotion detection and prediction via cross-modal temporal erasing network. In CVPR, 2023c.
- Extdm: Distribution extrapolation diffusion model for video prediction. In CVPR, 2024a.
- Mart: Masked affective representation learning via masked temporal distribution distillation. In CVPR, 2024b.
- Emotion recognition from multiple modalities: Fundamentals and methodologies. SPM, 38(6):59–73, 2021.
- Affective image content analysis: Two decades review and new perspectives. TPAMI, 44(10):6729–6751, 2022.
- Bridging global context interactions for high-fidelity image completion. In CVPR, 2022.
- Places: A 10 million image database for scene recognition. TPAMI, 40(6):1452–1464, 2018.
- Towards locality similarity preserving to 3d human pose estimation. In ACCV, 2020.
- Dc-gnet: Deep mesh relation capturing graph convolution network for 3d human shape reconstruction. In ACM MM, 2021.
- Adapt or perish: Adaptive sparse transformer with attentive feature refinement for image restoration. In CVPR, 2024.