DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection (2306.01272v3)
Abstract: The tremendous recent advances in generative artificial intelligence techniques have led to significant successes and promise in a wide range of different applications ranging from conversational agents and textual content generation to voice and visual synthesis. Amid the rise in generative AI and its increasing widespread adoption, there has been significant growing concern over the use of generative AI for malicious purposes. In the realm of visual content synthesis using generative AI, key areas of significant concern has been image forgery (e.g., generation of images containing or derived from copyright content), and data poisoning (i.e., generation of adversarially contaminated images). Motivated to address these key concerns to encourage responsible generative AI, we introduce the DeepfakeArt Challenge, a large-scale challenge benchmark dataset designed specifically to aid in the building of machine learning algorithms for generative AI art forgery and data poisoning detection. Comprising of over 32,000 records across a variety of generative forgery and data poisoning techniques, each entry consists of a pair of images that are either forgeries / adversarially contaminated or not. Each of the generated images in the DeepfakeArt Challenge benchmark dataset \footnote{The link to the dataset: http://anon\_for\_review.com} has been quality checked in a comprehensive manner.
- Clark D Asay. An empirical study of copyright’s substantial similarity test. Available at SSRN 4013095, 2022.
- Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine, 2023.
- Judging similarity. Iowa L. Rev., 100:267, 2014.
- Emergent and predictable memorization in large language models. arXiv preprint arXiv:2304.11158, 2023.
- Align your latents: High-resolution video synthesis with latent diffusion models. arXiv preprint arXiv:2304.08818, 2023.
- Quantifying memorization across neural language models. arXiv preprint arXiv:2202.07646, 2022.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, 2023.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pp. 2206–2216. PMLR, 2020.
- Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670, 2020.
- Adversarial attacks on medical machine learning. Science, 363(6433):1287–1289, 2019.
- Adversarial attacks and adversarial robustness in computational pathology. Nature Communications, 13(1):5711, 2022.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Deep residual learning for image recognition. arxiv 2015. arXiv preprint arXiv:1512.03385, 14, 2015.
- Language is not all you need: Aligning perception with language models. arXiv preprint arXiv:2302.14045, 2023.
- Hoki Kim. Torchattacks: A pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950, 2020.
- Will Knight. This copyright lawsuit could shape the future of generative ai. WIRED, 2022. URL https://www.wired.com/story/this-copyright-lawsuit-could-shape-the-future-of-generative-ai/.
- Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852, 2023.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- OpenAI. Gpt-4 technical report. arXiv, 2023.
- Jonas Oppenlaender. The creativity of text-based generative art. arXiv preprint arXiv:2206.02904, 2022.
- Allison Parshall. How this ai image won a major photography competition. Scientific American, 2023.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pp. 8821–8831. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- Kevin Roose. An ai-generated picture won an art prize. artists aren’t happy. The New York Times, 2:2022, 2022.
- Imagenet large scale visual recognition challenge (2014). arXiv preprint arXiv:1409.0575, 2(3), 2014.
- Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arXiv preprint arXiv:1505.00855, 2015.
- Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
- Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
- Ellen Sheng. In generative ai legal wild west, the courtroom battles are just getting started. CNBC, 2023. URL https://www.cnbc.com/2023/04/03/in-generative-ai-legal-wild-west-lawsuits-are-just-getting-started.html.
- Diffusion art or digital forgery? investigating data replication in diffusion models. arXiv preprint arXiv:2212.03860, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- James Vincent. Ai art tools stable diffusion and midjourney targeted with copyright lawsuit. The Verge, 2023. URL https://www.theverge.com/2023/1/16/23557098/generative-ai-art-copyright-legal-lawsuit-stable-diffusion-midjourney-deviantart.
- Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
- A machine and human reader study on ai diagnosis model safety under attacks of adversarial images. Nature communications, 12(1):7281, 2021.