Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement (2309.07254v4)
Abstract: While diffusion models demonstrate a remarkable capability for generating high-quality images, their tendency to `replicate' training data raises privacy concerns. Although recent research suggests that this replication may stem from the insufficient generalization of training data captions and duplication of training images, effective mitigation strategies remain elusive. To address this gap, our paper first introduces a generality score that measures the caption generality and employ LLM to generalize training captions. Subsequently, we leverage generalized captions and propose a novel dual fusion enhancement approach to mitigate the replication of diffusion models. Our empirical results demonstrate that our proposed methods can significantly reduce replication by 43.5% compared to the original diffusion model while maintaining the diversity and quality of generations. Code is available at https://github.com/HowardLi0816/dual-fusion-diffusion.
- “RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 20585–20594.
- “C2PI: An Efficient Crypto-Clear Two-Party Neural Network Private Inference,” in 2023 60th ACM/IEEE Design Automation Conference (DAC), 2023, pp. 1–6.
- “Making Models Shallow Again: Jointly Learning To Reduce Non-Linearity and Depth for Latency-Efficient Private Inference,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2023, pp. 4685–4689.
- “RNA-ViT: Reduced-Dimension Approximate Normalized Attention Vision Transformers for Latency Efficient Private Inference,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2023, pp. 1–9.
- “SAL-ViT: Towards Latency Efficient Private Inference on ViT using Selective Attention Search with a Learnable Softmax Approximation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 5116–5125.
- “Deep Learning Classification of Angle Closure based on Anterior Segment OCT,” Ophthalmology Glaucoma, 2023.
- “Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs,” Bioengineering, vol. 10, no. 11, pp. 1266, 2023.
- “Please tell me more: Privacy impact of explainability through the lens of membership inference attack,” in 2024 IEEE Symposium on Security and Privacy (SP). IEEE, 2024.
- “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
- “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695.
- “Hierarchical text-conditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, vol. 1, no. 2, pp. 3, 2022.
- “Adding Conditional Control to Text-to-Image Diffusion Models,” 2023.
- “Diffusion art or digital forgery? investigating data replication in diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6048–6058.
- “Understanding and mitigating copying in diffusion models,” arXiv preprint arXiv:2305.20086, 2023.
- “Provable copyright protection for generative models,” arXiv preprint arXiv:2302.10870, 2023.
- OpenAI, “GPT-4 Technical Report,” 2023.
- “WizardLM: Empowering large language models to follow complex instructions,” arXiv preprint arXiv:2304.12244, 2023.
- “WizardCoder: Empowering Code Large Language Models with Evol-Instruct,” arXiv preprint arXiv:2306.08568, 2023.
- “Large language models as general pattern machines,” arXiv preprint arXiv:2307.04721, 2023.
- Zhen Qin et al., “Large language models are effective text rankers with pairwise ranking prompting,” arXiv preprint arXiv:2306.17563, 2023.
- “Large language models as optimizers,” 2023.
- Hugo Touvron et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
- “BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning. PMLR, 2022, pp. 12888–12900.
- “A self-supervised descriptor for image copy detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14532–14542.
- “General versus specific sentences: automatic identification and application to analysis of news summaries,” 2011.
- “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing,” 2017.
- George A Miller, “WordNet: a lexical database for English,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
- Long Ouyang et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022.
- Yuntao Bai et al., “Training a helpful and harmless assistant with reinforcement learning from human feedback,” arXiv preprint arXiv:2204.05862, 2022.
- Christoph Schuhmann et al., “LAION-5B: An open large-scale dataset for training next generation image-text models,” Advances in Neural Information Processing Systems, vol. 35, pp. 25278–25294, 2022.
- Ya Le and Xuan S. Yang, “Tiny ImageNet Visual Recognition Challenge,” 2015.
- “GANS trained by a two time-scale update rule converge to a local Nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
- Chenghao Li (37 papers)
- Dake Chen (10 papers)
- Yuke Zhang (22 papers)
- Peter A. Beerel (66 papers)