Countering Adversarial Images using Input Transformations (1711.00117v3)

Published 31 Oct 2017 in cs.CV

Abstract: This paper investigates strategies that defend against adversarial-example attacks on image-classification systems by transforming the inputs before feeding them to the system. Specifically, we study applying image transformations such as bit-depth reduction, JPEG compression, total variance minimization, and image quilting before feeding the image to a convolutional network classifier. Our experiments on ImageNet show that total variance minimization and image quilting are very effective defenses in practice, in particular, when the network is trained on transformed images. The strength of those defenses lies in their non-differentiable nature and their inherent randomness, which makes it difficult for an adversary to circumvent the defenses. Our best defense eliminates 60% of strong gray-box and 90% of strong black-box attacks by a variety of major attack methods

Citations (1,322)

View on Semantic Scholar

Summary

The paper demonstrates that randomized, non-differentiable transformations like TVM and image quilting significantly improve CNN robustness, reaching up to 90% accuracy on adversarial images.
It systematically evaluates multiple input transformation methods—including cropping-rescaling, bit-depth reduction, JPEG compression, TVM, and quilting—across gray-box and black-box attack scenarios.
The findings imply that ensemble strategies combining diverse transformations can effectively mitigate vulnerabilities in security-critical AI applications without specialized adversarial training.

Defense Mechanisms Against Adversarial Images Via Input Transformations

The paper by Chuan Guo et al. investigates the efficacy of using input transformations as a defense strategy against adversarial attacks on image classification systems. The researchers explore various non-deterministic, non-differentiable transformation techniques to sanitize adversarially perturbed images before they are fed into convolutional neural networks (CNNs).

Key Techniques and Evaluation

The proposed defenses include image cropping-rescaling, bit-depth reduction, JPEG compression, total variance minimization (TVM), and image quilting. The authors provide both qualitative and quantitative evaluations of these methods against several advanced attack algorithms: fast gradient sign method (FGSM), iterative FGSM (I-FGSM), DeepFool, and Carlini-Wagner $L_2$ attack (CW-L2).

Experimental Results

The experiments conducted are extensive, covering both gray-box and black-box attack settings. In a gray-box scenario, where attackers have access to the model but not the defense strategy, image cropping, TVM, and image quilting were notably effective. Specifically, TVM and image quilting serve as robust defenses due to their randomized and non-differentiable characteristics, making it challenging for attackers to engineer successful perturbations. The results highlight:

TVM and image quilting achieved up to 50% classification accuracy on adversarial images.
When models were retrained on transformed images, TVM and image quilting displayed even higher robustness by achieving 80%-90% accuracy on various strong attacks.
The combined defense mechanisms demonstrated remarkable improvements, suggesting that ensemble strategies with multiple transformations offer better protection.

Implications

The theoretical underpinnings suggest that non-differentiable and stochastic transformations effectively counter the structured nature of adversarial perturbations. This insight extends the known defense strategies beyond deterministic denoising methods like JPEG compression or bit-depth reduction, which are less effective in adversarial settings.

The practical implications are significant for real-world deployment in security-critical AI applications, such as autonomous driving and medical diagnostics. Combining multiple transformation strategies can mitigate vulnerabilities without requiring specific adversarial training, making the approach versatile across different model architectures and types of attacks.

Future Developments

Future research trajectories might focus on integrating these transformation techniques with other robust optimization strategies. Additionally, extending these insights to various modalities beyond images, such as audio and text, can universalize the practical utility of these defenses.

Overall, the paper demonstrates that well-designed transformations can serve as potent defenses against adversarial attacks, leveraging inherent randomness and non-differentiability to bolster the robustness of CNN-based models.

PDF Markdown

Related Papers

GitHub

GitHub - facebookarchive/adversarial_image_defenses: Countering Adversarial Image using Input Transformations. (496 stars)