- The paper demonstrates that feature denoising blocks improve CNN adversarial robustness by reducing noise in internal feature maps.
- The authors integrate non-local means based denoising blocks with a 1x1 convolution and residual connection to maintain signal integrity.
- Experiments on ImageNet show the method achieves 55.7% accuracy under white-box PGD attacks and state-of-the-art performance in black-box tests.
Feature Denoising for Improving Adversarial Robustness
The paper "Feature Denoising for Improving Adversarial Robustness" by Xie et al. introduces a novel approach to enhancing the adversarial robustness of convolutional neural networks (CNNs) by incorporating feature denoising mechanisms. This paper posits that adversarial perturbations on images induce substantial noise within the internal feature maps of CNNs, thereby compromising their performance. The authors propose the integration of feature denoising blocks within network architectures to mitigate this perturbation.
Methodology
The core proposition of the paper is the enhancement of CNN architectures through specially designed denoising blocks. These blocks are formulated to reduce noise in feature maps introduced through adversarial perturbations. The authors consider various denoising operations, such as non-local means, bilateral filters, mean filters, and median filters, and investigate their efficacy in improving adversarial robustness.
A generic denoising block, based on non-local means, includes a 1x1 convolutional layer and a residual connection. This design is influenced by self-attention mechanisms and non-local networks, aimed at maintaining crucial signal information while suppressing noise.
Experimental Results
The paper provides an extensive evaluation of the proposed method's effectiveness against both white-box and black-box adversarial attacks on the ImageNet dataset.
White-Box Attacks
In white-box settings, the adversarial robustness is tested using Projected Gradient Descent (PGD) attacks with various iteration counts. The results indicate that integrating feature denoising significantly improves robustness. For instance, the denoising model achieves 55.7% accuracy under a 10-iteration PGD attack, compared to the 27.9% accuracy of previously established methods (ALP). Remarkably, even under extreme 2000-iteration PGD attacks, the model maintains 42.6% accuracy.
Ablation studies further validate that the non-local (Gaussian) denoising operation yields the highest performance improvement. Moreover, the importance of the 1x1 convolution and residual connection within the denoising block is highlighted, as their removal markedly degrades robustness.
Black-Box Attacks
In the black-box attack scenario, employing the top 5 attackers from the CAAD 2017 competition, the proposed method demonstrates superior robustness. Under a stringent "all-or-nothing" evaluation criterion, the model displays 49.5% accuracy, significantly outperforming baseline methods and the best-performing entries from the previous year.
Notably, in the CAAD 2018 competition, the method achieved first place, with an accuracy of 50.6% against 48 unknown attackers on a secret, ImageNet-like test dataset. This success underscores the practical utility and robustness of the feature denoising approach in highly competitive and unpredictable environments.
Implications and Future Work
The integration of feature denoising blocks into CNNs represents a promising advancement in improving adversarial robustness. This approach not only demonstrates efficacy in current settings but also suggests a new architectural design principle for future models.
Theoretical implications point towards a better understanding of how adversarial perturbations affect internal feature representations and how these effects can be mitigated. Practically, the incorporation of feature denoising can be applied to various domains where adversarial robustness is critical, such as autonomous driving, medical imaging, and security systems.
Future work could explore further refinements of denoising mechanisms, adaptive denoising strategies based on feature distributions, and extending this approach to other types of adversarial attacks and domains. Additionally, the trade-offs between clean performance and adversarial robustness remain an interesting research avenue, particularly for applications requiring both high accuracy and robustness.
In summary, the paper presents a methodologically sound and empirically validated approach to enhancing adversarial robustness in CNNs through feature denoising. It sets a new direction for adversarial defense research and opens up numerous possibilities for further exploration and application.