Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Watermark-based Attribution of AI-Generated Content (2404.04254v3)

Published 5 Apr 2024 in cs.CR, cs.AI, cs.CL, cs.CV, and cs.LG

Abstract: Several companies have deployed watermark-based detection to identify AI-generated content. However, attribution--the ability to trace back to the user of a generative AI (GenAI) service who created a given piece of AI-generated content--remains largely unexplored despite its growing importance. In this work, we aim to bridge this gap by conducting the first systematic study on watermark-based, user-level attribution of AI-generated content. Our key idea is to assign a unique watermark to each user of the GenAI service and embed this watermark into the AI-generated content created by that user. Attribution is then performed by identifying the user whose watermark best matches the one extracted from the given content. This approach, however, faces a key challenge: How should watermarks be selected for users to maximize attribution performance? To address the challenge, we first theoretically derive lower bounds on detection and attribution performance through rigorous probabilistic analysis for any given set of user watermarks. Then, we select watermarks for users to maximize these lower bounds, thereby optimizing detection and attribution performance. Our theoretical and empirical results show that watermark-based attribution inherits both the accuracy and (non-)robustness properties of the underlying watermark. Specifically, attribution remains highly accurate when the watermarked AI-generated content is either not post-processed or subjected to common post-processing such as JPEG compression, as well as black-box adversarial post-processing with limited query budgets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Shivdeep Dhaliwal “Elon Musk isn’t dating GM’s Mary Barra: he has this to say though on the photos.”, https://www.benzinga.com/news/23/03/31505898/elon-musk-isnt-dating-gms-mary-barra-he-has-this-to-say-though-on-the-photos, 2023
  2. Shanti Escalante-De Mattei “US Copyright Office: AI Generated Works Are Not Eligible for Copyright.”, https://www.artnews.com/art-news/news/ai-generator-art-text-us-copyright-policy-1234661683, 2023
  3. “Hierarchical text-conditional image generation with clip latents” In arXiv preprint arXiv:2204.06125, 2022
  4. “Identifying AI-generated images with SynthID.”, https://deepmind.google/discover/blog/identifying-ai-generated-images-with-synthid, 2023
  5. “High-resolution image synthesis with latent diffusion models” In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
  6. Yusuf Mehdi “Announcing Microsoft Copilot, your everyday AI companion.”, https://blogs.microsoft.com/blog/2023/09/21/announcing-microsoft-copilot-your-everyday-ai-companion, 2023
  7. “Robust template matching for affine resistant image watermarks” In IEEE Transactions on Image Processing, 2000
  8. “Robust image watermarking based on multiband wavelets and empirical mode decomposition” In IEEE Transactions on Image Processing, 2007
  9. Qingquan Wang “invisible-watermark”, https://github.com/ShieldMnt/invisible-watermark, 2021
  10. Haribabu Kandi, Deepak Mishra and Subrahmanyam RK Sai Gorthi “Exploring the learning capabilities of convolutional neural networks for robust image watermarking” In Computers & Security, 2017
  11. “Hidden: Hiding data with deep networks” In European Conference on Computer Vision, 2018
  12. “Distortion agnostic deep watermarking” In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
  13. “Romark: A robust watermarking system using adversarial training” In Conference on Neural Information Processing Systems Workshop, 2019
  14. “The Stable Signature: Rooting Watermarks in Latent Diffusion Models” In International Conference on Computer Vision, 2023
  15. Ian Goodfellow, Jonathon Shlens and Christian Szegedy “Explaining and Harnessing Adversarial Examples” In International Conference on Learning Representations, 2015
  16. Zhengyuan Jiang, Jinghuai Zhang and Neil Zhenqiang Gong “Evading Watermark based Detection of AI-Generated Content” In ACM Conference on Computer and Communications Security, 2023
  17. Google Brain “Imagen.”, https://imagen.research.google, 2023
  18. “Distinguishing string selection problems” In Information and Computation, 2003
  19. Jens Gramm, Rolf Niedermeier and Peter Rossmanith “Fixed-parameter algorithms for closest string and related problems” In Algorithmica, 2003
  20. “Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust” In Conference on Neural Information Processing Systems, 2023
  21. “A Watermark for Large Language Models” In International Conference on Machine Learning, 2023
  22. “Adversarial watermarking transformer: Towards tracing text provenance with data hiding” In IEEE Symposium on Security and Privacy, 2021
  23. “Towards Deep Learning Models Resistant to Adversarial Attacks” In International Conference on Learning Representations, 2018
  24. “Can AI-Generated Text be Reliably Detected?” In arXiv preprint arXiv:2303.11156, 2023
  25. Zhi-Zhong Chen, Bin Ma and Lusheng Wang “Randomized fixed-parameter algorithms for the closest string problem” In Algorithmica, 2016
  26. “DiffusionDB: A Large-Scale Prompt Gallery Dataset for Text-to-Image Generative Models” In Annual Meeting of the Association for Computational Linguistics, 2023
  27. “Midjourney User Prompts & Generated Images (250k)”, https://www.kaggle.com/ds/2349267, 2022
  28. “Microsoft coco: Common objects in context” In European Conference on Computer Vision, 2014
  29. “Imagenet: A large-scale hierarchical image database” In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2009
  30. “Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning” In Annual Meeting of the Association for Computational Linguistics, 2018
  31. “Towards robust data hiding against (jpeg) compression: A pseudo-differentiable deep learning approach” In arXiv preprint arXiv:2101.00973, 2020
  32. “Image quality assessment: from error visibility to structural similarity” In IEEE Transactions on Image Processing, 2004
  33. “Pointer sentinel mixture models” In arXiv preprint arXiv:1609.07843, 2016
  34. Prithiviraj Damodaran “Parrot: Paraphrase generation for NLU.”, 2021
Citations (6)

Summary

We haven't generated a summary for this paper yet.