- The paper presents Adv-watermark, a method that embeds visible watermarks using the Basin Hopping Evolution algorithm to create effective adversarial examples.
- It utilizes alpha blending and a population-based search to optimize watermark transparency and placement in black-box attack scenarios.
- Experimental results on ImageNet and CASIA-WebFace demonstrate high attack success rates and robustness against common defense techniques.
Adv-watermark: A Novel Watermark Perturbation for Adversarial Examples
Introduction
The paper presents a technique that leverages watermarks as adversarial perturbations to attack deep neural networks (DNNs). Traditional adversarial perturbations are often random noises that are imperceptible and lack practical meaning. In contrast, the proposed "Adv-watermark" method uses perceptible watermarks, such as logos or text, which are semantically meaningful and can be applied without arousing suspicion.
Methodology
The core of the proposed approach is the integration of image watermarking techniques with adversarial algorithms, creating a perturbation method that is effective in a black-box attack setting. The technique involves an alpha blending method to embed the watermark into images, where the transparency and position of the watermark are optimized via the newly proposed Basin Hopping Evolution (BHE) algorithm.
The BHE algorithm enhances the state-of-the-art optimization by using a population-based global search strategy, improving upon the vanilla Basin Hopping (BH) approach. It addresses the issue of local minima entrapment typical of BH by employing multiple starting points and crossover operations to maintain population diversity and find global optima more effectively.
Figure 1: Adversarial examples with watermark perturbations. The original class labels are in black text and the adversarial class labels are in red text.
Practical Implementation
To implement the Adv-watermark technique, the BHE algorithm requires initialization of step size and transparency parameters, which dictate the embedding positions for watermarks. The algorithm evolves a population of solutions, where each is evaluated for its potential to misclassify images when passed through a target DNN.
The proposed Adv-watermark approach employs watermarks in black-box attack scenarios, meaning attackers have no access to the internal workings of the DNN model. This presents a significant advantage over previous methods that require detailed model insights for crafting perturbations.
Figure 2: In this paper, we explore two kinds of media as the watermark: logos and texts. These six host images are randomly selected from ImageNet.
Experimental Results
Extensive evaluations on ImageNet and CASIA-WebFace datasets demonstrate the effectiveness of Adv-watermark against state-of-the-art DNNs. Table 1 in the paper illustrates the high attack success rate of the method across various DNN architectures, outperforming existing black-box adversarial methods. The robustness of Adv-watermark against image transformation defense techniques is notable; JPEG compression and other common defenses fail to mitigate its effectiveness.
Experiments show that varying the size and type of watermark (text vs. logos), as well as their positions on images, significantly affects attack success rates. The methodology is particularly effective when embedding visible watermarks that are not only useful for adversarial attacks but also protect image copyrights.
Figure 3: The top row is the original images (they are correctly classified by Resnet101) and their corresponding heat-maps (generated by Grad-CAM algorithm). The bottom row is the adversarial images with the visible watermark and their corresponding heat-maps.
Conclusion and Future Work
The Adv-watermark method introduces a dual-purpose adversarial technique that not only misleads DNN classifiers but also provides a form of copyright protection. By leveraging meaningful watermarks, this approach effectively utilizes practical perturbations against machine learning models. Future research could explore expanding this concept to other perceptual media and investigate further defense mechanisms against such adversarial attacks. Additionally, understanding the specifics of how positional changes in watermarks affect DNN predictions might provide further insights into their inherent vulnerabilities.
The proposed BHE optimization method offers a new pathway for improving adversarial attack success rates with reduced query complexity, which is significant for practical applications in security-critical environments. Overall, the Adv-watermark technique enhances the landscape of adversarial research by introducing a method that is both efficient and applicable in real-world scenarios.