Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser (1712.02976v2)

Published 8 Dec 2017 in cs.CV

Abstract: Neural networks are vulnerable to adversarial examples, which poses a threat to their application in security sensitive systems. We propose high-level representation guided denoiser (HGD) as a defense for image classification. Standard denoiser suffers from the error amplification effect, in which small residual adversarial noise is progressively amplified and leads to wrong classifications. HGD overcomes this problem by using a loss function defined as the difference between the target model's outputs activated by the clean image and denoised image. Compared with ensemble adversarial training which is the state-of-the-art defending method on large images, HGD has three advantages. First, with HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks. Second, HGD can be trained on a small subset of the images and generalizes well to other images and unseen classes. Third, HGD can be transferred to defend models other than the one guiding it. In NIPS competition on defense against adversarial attacks, our HGD solution won the first place and outperformed other models by a large margin.

Citations (815)

View on Semantic Scholar

Summary

The paper introduces a high-level representation guided denoiser that effectively counteracts adversarial perturbations in neural networks.
It employs novel loss functions based on feature maps and logits to enhance model robustness under various attack scenarios.
Experimental results on ImageNet demonstrate improved accuracy, efficiency, and transferability compared to traditional defense methods.

Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser

In the paper "Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser," Fangzhou Liao et al. present an advanced method to enhance the robustness of neural networks against adversarial attacks. The authors introduce a denoising strategy, leveraging high-level representation guidance to mitigate the impact of adversarial perturbations on image classification tasks. This approach addresses key limitations of conventional denoising methods and adversarial training, leading to significant improvements in robustness and generalizability.

Overview of High-Level Representation Guided Denoiser (HGD)

The high-level representation guided denoiser (HGD) aims to counteract the vulnerability of neural networks to adversarial examples — inputs intentionally designed to mislead models with subtle perturbations. Traditional pixel-level denoisers suffer from the error amplification effect, where residual adversarial noise still leads to incorrect classifications after being magnified through the layers of the neural network. HGD differentiates itself by employing a novel loss function defined as the difference between the output of the target model when given clean versus denoised images.

Methodology and Models

The authors explore two primary denoiser models: the Denoising Autoencoder (DAE) and the Denoising U-Net (DUNET). DUNET introduces structural enhancements over DAE by incorporating lateral connections and residual learning principles to better handle high-resolution images. The HGD further progresses by redefining the loss function using high-level neural representations instead of pixel-level differences. Notably, two variants of HGD are proposed:

Feature Guided Denoiser (FGD): Uses the feature maps from the top convolutional layer of the target model.
Logits Guided Denoiser (LGD): Utilizes the logits, providing a direct correlation with classification outputs.

An additional comparison is drawn against class-label guided denoisers (CGD) that use label-based loss functions, highlighting HGD's unsupervised nature.

Experimental Findings

The method is rigorously evaluated on the ImageNet dataset with a variety of adversarial attacks. The train dataset consists of 30K images, generating 210K adversarial samples via multiple attacking methods, including FGSM and IFGSM. The results demonstrate that HGD significantly outperforms both pixel-level denoisers and state-of-the-art adversarial training methods. Key findings include:

Robustness: HGD variants, particularly LGD, achieve higher classification accuracy under both white-box and black-box attacks.
Efficiency: HGD requires significantly less training data and time compared to ensemble adversarial training methods.
Transferability: HGD trained on one target model (e.g., Inception v3) can effectively defend other models (e.g., ResNet), and generalizes well across different image classes.

Implications and Future Directions

The implications of this research span both practical and theoretical realms. Practically, the HGD offers a viable, efficient defense mechanism suitable for deployment in various security-sensitive applications such as autonomous driving and identity authentication. Theoretically, the approach underscores the benefits of leveraging high-level neural representations to counter adversarial perturbations, opening avenues for further research into advanced, model-agnostic defense mechanisms.

Future developments could focus on optimal attack sets that enhance the robustness of the trained denoisers, integration with real-time adaptation mechanisms, and exploration of end-to-end training frameworks incorporating adversarial transformations.

Conclusion

The paper by Liao et al. introduces a significant advancement in defending neural networks against adversarial attacks through the high-level representation guided denoiser. By resolving the error amplification effect and demonstrating robust, efficient defense capabilities, HGD stands out as a promising solution that not only maintains the integrity of clean images but also excels under diverse adversarial conditions. This work lays a foundational pathway for future defensive strategies in the adversarial machine learning landscape.

PDF Markdown