Papers
Topics
Authors
Recent
2000 character limit reached

Adv-watermark: A Novel Watermark Perturbation for Adversarial Examples

Published 5 Aug 2020 in cs.CR and cs.MM | (2008.01919v2)

Abstract: Recent research has demonstrated that adding some imperceptible perturbations to original images can fool deep learning models. However, the current adversarial perturbations are usually shown in the form of noises, and thus have no practical meaning. Image watermark is a technique widely used for copyright protection. We can regard image watermark as a king of meaningful noises and adding it to the original image will not affect people's understanding of the image content, and will not arouse people's suspicion. Therefore, it will be interesting to generate adversarial examples using watermarks. In this paper, we propose a novel watermark perturbation for adversarial examples (Adv-watermark) which combines image watermarking techniques and adversarial example algorithms. Adding a meaningful watermark to the clean images can attack the DNN models. Specifically, we propose a novel optimization algorithm, which is called Basin Hopping Evolution (BHE), to generate adversarial watermarks in the black-box attack mode. Thanks to the BHE, Adv-watermark only requires a few queries from the threat models to finish the attacks. A series of experiments conducted on ImageNet and CASIA-WebFace datasets show that the proposed method can efficiently generate adversarial examples, and outperforms the state-of-the-art attack methods. Moreover, Adv-watermark is more robust against image transformation defense methods.

Citations (73)

Summary

  • The paper presents Adv-watermark, a method that embeds visible watermarks using the Basin Hopping Evolution algorithm to create effective adversarial examples.
  • It utilizes alpha blending and a population-based search to optimize watermark transparency and placement in black-box attack scenarios.
  • Experimental results on ImageNet and CASIA-WebFace demonstrate high attack success rates and robustness against common defense techniques.

Adv-watermark: A Novel Watermark Perturbation for Adversarial Examples

Introduction

The paper presents a technique that leverages watermarks as adversarial perturbations to attack deep neural networks (DNNs). Traditional adversarial perturbations are often random noises that are imperceptible and lack practical meaning. In contrast, the proposed "Adv-watermark" method uses perceptible watermarks, such as logos or text, which are semantically meaningful and can be applied without arousing suspicion.

Methodology

The core of the proposed approach is the integration of image watermarking techniques with adversarial algorithms, creating a perturbation method that is effective in a black-box attack setting. The technique involves an alpha blending method to embed the watermark into images, where the transparency and position of the watermark are optimized via the newly proposed Basin Hopping Evolution (BHE) algorithm.

The BHE algorithm enhances the state-of-the-art optimization by using a population-based global search strategy, improving upon the vanilla Basin Hopping (BH) approach. It addresses the issue of local minima entrapment typical of BH by employing multiple starting points and crossover operations to maintain population diversity and find global optima more effectively. Figure 1

Figure 1: Adversarial examples with watermark perturbations. The original class labels are in black text and the adversarial class labels are in red text.

Practical Implementation

To implement the Adv-watermark technique, the BHE algorithm requires initialization of step size and transparency parameters, which dictate the embedding positions for watermarks. The algorithm evolves a population of solutions, where each is evaluated for its potential to misclassify images when passed through a target DNN.

The proposed Adv-watermark approach employs watermarks in black-box attack scenarios, meaning attackers have no access to the internal workings of the DNN model. This presents a significant advantage over previous methods that require detailed model insights for crafting perturbations. Figure 2

Figure 2: In this paper, we explore two kinds of media as the watermark: logos and texts. These six host images are randomly selected from ImageNet.

Experimental Results

Extensive evaluations on ImageNet and CASIA-WebFace datasets demonstrate the effectiveness of Adv-watermark against state-of-the-art DNNs. Table 1 in the paper illustrates the high attack success rate of the method across various DNN architectures, outperforming existing black-box adversarial methods. The robustness of Adv-watermark against image transformation defense techniques is notable; JPEG compression and other common defenses fail to mitigate its effectiveness.

Experiments show that varying the size and type of watermark (text vs. logos), as well as their positions on images, significantly affects attack success rates. The methodology is particularly effective when embedding visible watermarks that are not only useful for adversarial attacks but also protect image copyrights. Figure 3

Figure 3: The top row is the original images (they are correctly classified by Resnet101) and their corresponding heat-maps (generated by Grad-CAM algorithm). The bottom row is the adversarial images with the visible watermark and their corresponding heat-maps.

Conclusion and Future Work

The Adv-watermark method introduces a dual-purpose adversarial technique that not only misleads DNN classifiers but also provides a form of copyright protection. By leveraging meaningful watermarks, this approach effectively utilizes practical perturbations against machine learning models. Future research could explore expanding this concept to other perceptual media and investigate further defense mechanisms against such adversarial attacks. Additionally, understanding the specifics of how positional changes in watermarks affect DNN predictions might provide further insights into their inherent vulnerabilities.

The proposed BHE optimization method offers a new pathway for improving adversarial attack success rates with reduced query complexity, which is significant for practical applications in security-critical environments. Overall, the Adv-watermark technique enhances the landscape of adversarial research by introducing a method that is both efficient and applicable in real-world scenarios.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.