Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models (2306.04744v3)

Published 7 Jun 2023 in cs.CV

Abstract: The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual descriptions, has concurrently escalated critical societal concerns such as misinformation. Although providing some mitigation, traditional fingerprinting mechanisms fall short in attributing responsibility for the malicious use of synthetic images. This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images, thereby serving as a potential countermeasure to model misuse. Our method modifies generative models based on each user's unique digital fingerprint, imprinting a unique identifier onto the resultant content that can be traced back to the user. This approach, incorporating fine-tuning into Text-to-Image (T2I) tasks using the Stable Diffusion Model, demonstrates near-perfect attribution accuracy with a minimal impact on output quality. Through extensive evaluation, we show that our method outperforms baseline methods with an average improvement of 11\% in handling image post-processes. Our method presents a promising and novel avenue for accountable model distribution and responsible use. Our code is available in \url{https://github.com/kylemin/WOUAF}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 18), pages 1615–1631, 2018.
  2. Variational image compression with a scale hyperprior. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
  3. Ali Breland. The bizarre and terrifying case of the “deepfake” video that helped bring an african nation to the brink. motherjones, 2019.
  4. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. ACM Transactions on Graphics (TOG), 42(4):1–10, 2023.
  5. Training-free layout control with cross-attention guidance. arXiv preprint arXiv:2304.03373, 2023.
  6. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  7. Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 485–497, 2019.
  8. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  9. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  10. The stable signature: Rooting watermarks in latent diffusion models. arXiv preprint arXiv:2303.15435, 2023.
  11. An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations, 2022.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  13. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718, 2021.
  14. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
  15. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  16. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
  17. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019a.
  18. Analyzing and improving the image quality of stylegan. arXiv preprint arXiv:1912.04958, 2019b.
  19. Training generative adversarial networks with limited data. Advances in neural information processing systems, 33:12104–12114, 2020a.
  20. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020b.
  21. Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364, 2022.
  22. Decentralized attribution of generative models. In International Conference on Learning Representations, 2021.
  23. Arijeta Lajka. New ai voice-cloning tools ‘add fuel’ to misinformation fire. AP News, 2023.
  24. A survey of deep neural network watermarking techniques. ArXiv, abs/2103.09274, 2021.
  25. Gligen: Open-set grounded text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22511–22521, 2023a.
  26. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. arXiv preprint arXiv:2306.00980, 2023b.
  27. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  28. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  29. Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378, 2023.
  30. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 10794–10803, 2018.
  31. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pages 16784–16804. PMLR, 2022.
  32. Attributing image generative models using latent fingerprints. arXiv preprint arXiv:2304.09752, 2023.
  33. Matt Novak. Ai image creator midjourney halts free trials but it has nothing to do with the pope’s jacket. forbes, 2023.
  34. Protecting intellectual property of generative adversarial networks from ambiguity attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3630–3639, 2021.
  35. Conceptbed: Evaluating concept learning abilities of text-to-image diffusion models. arXiv preprint arXiv:2306.04695, 2023a.
  36. Eclipse:a resource-efficient text-to-image prior for image generations. In ArXiv –, 2023b.
  37. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  38. Kornia: an open source differentiable computer vision library for pytorch. In Winter Conference on Applications of Computer Vision, 2020.
  39. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
  40. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  41. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  42. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2021.
  43. Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
  44. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  45. Stegastamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2117–2126, 2020.
  46. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pages 269–277, 2017.
  47. Riga: Covert and robust white-box watermarking of deep neural networks. In Proceedings of the Web Conference 2021, pages 993–1004, 2021.
  48. Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust. arXiv preprint arXiv:2305.20030, 2023.
  49. Responsible disclosure of generative models using scalable fingerprinting. arXiv preprint arXiv:2012.08726, 2020.
  50. Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In Proceedings of the IEEE/CVF International conference on computer vision, pages 14448–14457, 2021.
  51. Robust invisible video watermarking with attention. 2019.
  52. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  53. A recipe for watermarking diffusion models. ArXiv, abs/2303.10137, 2023.
  54. Hidden: Hiding data with deep networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 657–672, 2018.
Citations (23)

Summary

We haven't generated a summary for this paper yet.