DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception (2403.13304v1)
Abstract: Current perceptive models heavily depend on resource-intensive datasets, prompting the need for innovative solutions. Leveraging recent advances in diffusion models, synthetic data, by constructing image inputs from various annotations, proves beneficial for downstream tasks. While prior methods have separately addressed generative and perceptive models, DetDiffusion, for the first time, harmonizes both, tackling the challenges in generating effective data for perceptive models. To enhance image generation with perceptive models, we introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. To boost the performance of specific perceptive models, our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation. Experimental results from the object detection task highlight DetDiffusion's superior performance, establishing a new state-of-the-art in layout-guided generation. Furthermore, image syntheses from DetDiffusion can effectively augment training data, significantly enhancing downstream detection performance.
- Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
- Gan augmentation: Augmenting training data using generative adversarial networks. arXiv preprint arXiv:1810.10863, 2018.
- Coco-stuff: Thing and stuff classes in context. In CVPR, 2018.
- Multisiam: Self-supervised multi-instance siamese representation learning for autonomous driving. In ICCV, 2021.
- Mixed autoencoder for self-supervised visual representation learning. In CVPR, 2023a.
- Gaining wisdom from setbacks: Aligning large language models via mistake analysis. arXiv preprint arXiv:2310.10477, 2023b.
- Integrating geometric control into text-to-image diffusion models for high-quality detection data generation via text prompt. arxiv preprint arXiv: 2306.04607, 2023c.
- Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
- Layoutdiffuse: Adapting foundational diffusion models for layout-to-image generation. arXiv preprint arXiv:2302.08908, 2023.
- Frido: Feature pyramid diffusion for complex scene image synthesis. In AAAI, 2023.
- Magicdrive: Street view generation with diverse 3d geometry control. arXiv preprint arXiv:2310.02601, 2023a.
- DiffGuard: Semantic mismatch-guided out-of-distribution detection using pre-trained diffusion models. In ICCV, 2023b.
- Mixture of cluster-conditional lora experts for vision-language instruction tuning. arXiv preprint arXiv:2312.12379, 2023.
- Soda10m: Towards large-scale object detection benchmark for autonomous driving. arXiv preprint arXiv:2106.11118, 2021.
- Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574, 2022.
- Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778, 2023.
- Synthetic-to-real domain adaptation using contrastive unpaired translation. In CASE. IEEE, 2022.
- High-resolution complex scene synthesis with transformers. arXiv preprint arXiv:2105.06458, 2021.
- Deflating dataset bias using synthetic data augmentation. In CVPRW, 2020.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Coda: A real-world road corner case dataset for object detection in autonomous driving. arXiv preprint arXiv:2203.07724, 2022.
- Trackdiffusion: Multi-object tracking data generation via diffusion models. arXiv preprint arXiv:2312.00651, 2023a.
- Gligen: Open-set grounded text-to-image generation. In CVPR, 2023b.
- Image synthesis from layout with locality-aware mask adaption. In ICCV, 2021.
- Open-vocabulary object segmentation with diffusion models. In ICCV, 2023c.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Compositional visual generation with composable diffusion models. In ECCV, 2022a.
- Task-customized self-supervised pre-training with scalable dynamic routing. In AAAI, 2022b.
- Geom-erasing: Geometry-driven removal of implicit concept in diffusion models. arXiv preprint arXiv:2310.05873, 2023.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS, 2022.
- Follow your pose: Pose-guided text-to-video generation using pose-free videos. arXiv preprint arXiv:2304.01186, 2023.
- Domain adaptation of synthetic driving datasets for real-world autonomous driving. arXiv preprint arXiv:2302.04149, 2023.
- Stable diffusion with diffusers. Hugging Face–The AI community building the future., 2022.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
- Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In CVPR, 2023.
- Denoising diffusion implicit models. In ICLR, 2020.
- Image synthesis from reconfigurable layout and style. In ICCV, 2019.
- Rethinking the inception architecture for computer vision. In CVPR, 2016.
- Neural discrete representation learning. In NeurIPS, 2017.
- Semantic image synthesis via diffusion models. arXiv preprint arXiv:2207.00050, 2022.
- Datasetdm: Synthesizing data with perception annotations using diffusion models. In NeurIPS, 2023a.
- Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models. arXiv preprint arXiv:2303.11681, 2023b.
- Versatile diffusion: Text, images and variations all in one diffusion model. In ICCV, 2023.
- Modeling image composition for complex scene generation. In CVPR, 2022.
- Reco: Region-controlled text-to-image generation. In CVPR, 2023.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
- Unleashing text-to-image diffusion models for visual perception. arXiv preprint arXiv:2303.02153, 2023.
- Layoutdiffusion: Controllable diffusion model for layout-to-image generation. In CVPR, 2023.
- Task-customized masked autoencoder via mixture of cluster-conditional experts. In ICLR, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.