A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models (2403.12052v3)

Published 4 Jan 2024 in cs.CV

Abstract: Copyright law confers upon creators the exclusive rights to reproduce, distribute, and monetize their creative works. However, recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement. These technologies enable the unauthorized learning and replication of copyrighted content, artistic creations, and likenesses, leading to the proliferation of unregulated content. Notably, models like stable diffusion, which excel in text-to-image synthesis, heighten the risk of copyright infringement and unauthorized distribution.Machine unlearning, which seeks to eradicate the influence of specific data or concepts from machine learning models, emerges as a promising solution by eliminating the \enquote{copyright memories} ingrained in diffusion models. Yet, the absence of comprehensive large-scale datasets and standardized benchmarks for evaluating the efficacy of unlearning techniques in the copyright protection scenarios impedes the development of more effective unlearning methods. To address this gap, we introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset. This dataset encompasses anchor images, associated prompts, and images synthesized by text-to-image models. Additionally, we have developed a mixed metric based on semantic and style information, validated through both human and artist assessments, to gauge the effectiveness of unlearning approaches. Our dataset, benchmark library, and evaluation metrics will be made publicly available to foster future research and practical applications (https://rmpku.github.io/CPDM-page/, website / http://149.104.22.83/unlearning.tar.gz, dataset).

References (44)

Authors (14)

Rui Ma (112 papers)
Qiang Zhou (124 papers)
Bangjun Xiao (2 papers)
Yizhu Jin (4 papers)
Daquan Zhou (47 papers)
Xiuyu Li (24 papers)
Aishani Singh (2 papers)
Yi Qu (6 papers)
Kurt Keutzer (200 papers)
Xiaodong Xie (23 papers)
Jingtong Hu (51 papers)
Zhen Dong (87 papers)
Shanghang Zhang (173 papers)
Shiji Zhou (13 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/CSVisionPapers/status/1770352577084878851

https://twitter.com/gastronomy/status/1770300155540987999

A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models (2403.12052v3)

Summary

Related Papers

Tweets