Mist: Towards Improved Adversarial Examples for Diffusion Models (2305.12683v1)

Published 22 May 2023 in cs.CV and cs.AI

Abstract: Diffusion Models (DMs) have empowered great success in artificial-intelligence-generated content, especially in artwork creation, yet raising new concerns in intellectual properties and copyright. For example, infringers can make profits by imitating non-authorized human-created paintings with DMs. Recent researches suggest that various adversarial examples for diffusion models can be effective tools against these copyright infringements. However, current adversarial examples show weakness in transferability over different painting-imitating methods and robustness under straightforward adversarial defense, for example, noise purification. We surprisingly find that the transferability of adversarial examples can be significantly enhanced by exploiting a fused and modified adversarial loss term under consistent parameters. In this work, we comprehensively evaluate the cross-method transferability of adversarial examples. The experimental observation shows that our method generates more transferable adversarial examples with even stronger robustness against the simple adversarial defense.

PDF Abstract

The paper introduces Mist, a method for generating adversarial examples to protect against copyright infringement when using diffusion models (DMs). The method aims to address the weaknesses of current adversarial examples, specifically their limited transferability across different painting-imitating methods and their lack of robustness against adversarial defenses like noise purification. The authors find that the transferability of adversarial examples can be significantly improved by using a fused and modified adversarial loss term with consistent parameters.

The paper comprehensively evaluates the cross-method transferability of adversarial examples and demonstrates that Mist generates more transferable adversarial examples with enhanced robustness against simple adversarial defenses. The method combines two existing adversarial loss terms: semantic loss and textual loss.

The two loss terms are:

Semantic Loss:

$\delta := \arg\min\limits_{\delta} \mathbb{E}_{x'_{1:T}\sim u(x'_{1:T})}\mathcal{L}_{DM}(x',\theta),$ where $x\sim q(x), x'=x+\delta.$
- $\delta$ represents the perturbation added to the original image $x$ to create the adversarial example $x'$ .
- $x'_{1:T}$ denotes a sampling of latent variables.
- $\mathcal{L}_{DM}(x',\theta)$ is the training loss of the diffusion model with parameters $\theta$ .
- $q(x)$ is the data distribution.
- $u(x'_{1:T})$ is a uniform distribution.
Textual Loss:

$\delta:=\arg\min\limits_{\delta}\mathcal{L}_\mathcal{E}(x, \delta, y) = \arg\min\limits_{\delta}\Vert\mathcal{E}(y) - \mathcal{E}(x + \delta)\Vert_2,$
- $\mathcal{E}$ denotes the image encoder of the latent diffusion model (LDM).
- $x$ represents the input image.
- $y$ is the given target image.
- $\delta$ is the perturbation.
- $\mathcal{L}_\mathcal{E}$ represents the loss function.

The combined loss is defined as:

$\delta:= \arg\max\limits_{\delta}(w\mathbb{E}_{x'_{1:T}\sim u(x'_{1:T})} \mathcal{L}_{DM}(x',\theta) - \mathcal{L}_\mathcal{E}(x, \delta, y)),$ where $x\sim q(x), x'=x+\delta.$

$w$ represents the fused rate, balancing the semantic and textural losses.

The final loss term is:

$\mathbb{E}_{t,\epsilon\sim \mathcal{N}(0,1)}\mathbb{E}_{x'_{t}\sim u(x'_{t})}[w\Vert\epsilon - \epsilon_{\theta}(x'_t, t)\Vert^2_2 - \Vert\mathcal{E}(y) - \mathcal{E}(x + \delta)\Vert_2]$

$t$ is the timestep in the diffusion process.
$\epsilon$ is noise sampled from a normal distribution $\mathcal{N}(0,1)$ .
$\epsilon_{\theta}(x'_t, t)$ is the noise predicted by the diffusion model at timestep $t$ .
$x'_t$ is the perturbed image at timestep $t$ .

The paper provides three modes of operation corresponding to the different adversarial loss terms: semantic mode, textual mode, and fused mode, which uses the combined semantic and textual loss.

The performance of targeted adversarial examples is sensitive to the choice of targeted images in the textual loss. The paper suggests that images with a high contrast ratio and sharp canny edges are better suited as target images $y$ .

Experiments were conducted using Stable Diffusion Model with $l_{\infty}$ norm constraints. The sampling step was set to 100, the per-step perturbation budget to 1/255, and the total budget to 17/255. Van Gogh's paintings were used as source images, and the default fused weight was 1e4.

The effectiveness of Mist was evaluated under pre-trained scenarios, including textual inversion, Dreambooth, and Scenario.gg, as well as in preventing image modifications from image-to-image applications like NovelAI. The results indicate that Mist effectively protects images from AI-for-Art-based mimicry. The robustness of Mist under preprocessing, such as cropping and resizing, was also evaluated, and the method was compared against Gaussian noise and Glaze. Mist was the only method that remained effective under the crop-and-resize input transformation.

The paper compares the different modes of Mist and their effectiveness under different scenarios, using Fréchet Inception Distance (FID) and Precision as metrics. Under the textual inversion scenario, the semantic mode demonstrated the highest effectiveness and robustness. In the Dreambooth scenario, the textual mode proved more effective and robust compared to the semantic mode. The fused mode combines the textural and semantic modes, resulting in a balanced performance under both textual inversion and Dreambooth scenarios.

The choice of target images is closely related to the robustness of Mist under textual and fused modes. Target images with high contrast and repetitive patterns result in stronger attacks and exhibit more robustness against input transformations.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Chumeng Liang (10 papers)
Xiaoyu Wu (43 papers)

Citations (31)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - psyker-team/mist (348 stars)

Tweets

https://twitter.com/sorainvalid/status/1788267532052463758