Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples (1803.05787v2)

Published 14 Mar 2018 in cs.CV and cs.CR

Abstract: Image compression-based approaches for defending against the adversarial-example attacks, which threaten the safety use of deep neural networks (DNN), have been investigated recently. However, prior works mainly rely on directly tuning parameters like compression rate, to blindly reduce image features, thereby lacking guarantee on both defense efficiency (i.e. accuracy of polluted images) and classification accuracy of benign images, after applying defense methods. To overcome these limitations, we propose a JPEG-based defensive compression framework, namely "feature distillation", to effectively rectify adversarial examples without impacting classification accuracy on benign data. Our framework significantly escalates the defense efficiency with marginal accuracy reduction using a two-step method: First, we maximize malicious features filtering of adversarial input perturbations by developing defensive quantization in frequency domain of JPEG compression or decompression, guided by a semi-analytical method; Second, we suppress the distortions of benign features to restore classification accuracy through a DNN-oriented quantization refine process. Our experimental results show that proposed "feature distillation" can significantly surpass the latest input-transformation based mitigations such as Quilting and TV Minimization in three aspects, including defense efficiency (improve classification accuracy from $\sim20\%$ to $\sim90\%$ on adversarial examples), accuracy of benign images after defense ($\le1\%$ accuracy degradation), and processing time per image ($\sim259\times$ Speedup). Moreover, our solution can also provide the best defense efficiency ($\sim60\%$ accuracy) against the recent adaptive attack with least accuracy reduction ($\sim1\%$) on benign images when compared with other input-transformation based defense methods.

Authors (7)

Zihao Liu (36 papers)
Qi Liu (485 papers)
Tao Liu (350 papers)
Nuo Xu (37 papers)
Xue Lin (92 papers)
Yanzhi Wang (197 papers)
Wujie Wen (37 papers)

Citations (237)

View on Semantic Scholar

Summary

Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples

The paper "Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples" introduces a defensive framework aimed at enhancing the robustness of Deep Neural Networks (DNNs) against adversarial examples, by leveraging JPEG compression techniques. The authors propose a method termed "feature distillation" which aims to rectify adversarial inputs while maintaining the classification accuracy of benign images. The defense mechanism centers around optimizing JPEG compression to target adversarial perturbations specifically.

Approach and Methodology

The authors identify two significant constraints of prior JPEG-based defenses: inefficiency in removing adversarial perturbations and the resulting accuracy loss on benign images. They propose a two-step solution. Firstly, the authors enhance defense efficiency by developing a method for feature distillation that involves a defensive quantization step in the frequency domain during the JPEG compression process. By analyzing the distribution of adversarial perturbations across frequency bands, they designed a quantization method that maximizes the suppression of adversarial features.

Secondly, the authors address the degradation of benign feature reconstruction by proposing a refinement process oriented towards the characteristics of DNNs. This involves analyzing the importance of different frequency components to the model and adjusting the quantization step accordingly.

Experimental Results

The experiments reveal that the proposed feature distillation significantly outperforms several standard and advanced input-transformation defenses such as Quilting and Total Variation Minimization. The benchmark demonstrates a remarkable improvement in classification accuracy on adversarial samples from approximately 20% to roughly 90%, with a negligible drop of 1% in accuracy on benign samples. Additionally, there is a substantial enhancement in processing speed, presenting a 259x speedup compared to prior methods.

Implications and Future Directions

This work illustrates an effective application of JPEG compression tailored for adversarial robustness, diverging from traditional HVS-based approaches to a DNN-centric methodology. The paper paves the way for enhancing model-agnostic defenses by tailoring transformations to neural network architectures, hence suggesting a promising research direction towards adaptive transformations derived from model analytics.

Future works could explore the adaptation of the proposed framework across diverse model architectures and datasets, as well as real-world deployment scenarios where adversarial attacks could have critical consequences. The potential to integrate such defense mechanisms directly into existing imaging and networking pipelines is another promising avenue of exploration, enhancing robustness potentially at the point of image acquisition.

In summary, the research not only provides a novel angle on leveraging JPEG compression techniques for enhancing adversarial defense but also advances the understanding of how model characteristics can be aligned with input transformations to bolster neural network resilience. This paper inspires further exploration into adaptive defenses, potentially leading to more robust models capable of withstanding evolving adversarial strategies.

PDF Markdown