DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles (2009.14720v2)

Published 30 Sep 2020 in cs.LG, cs.CR, and stat.ML

Abstract: Recent research finds CNN models for image classification demonstrate overlapped adversarial vulnerabilities: adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset. Adversarial training, as a general robustness improvement technique, eliminates the vulnerability in a single model by forcing it to learn robust features. The process is hard, often requires models with large capacity, and suffers from significant loss on clean data accuracy. Alternatively, ensemble methods are proposed to induce sub-models with diverse outputs against a transfer adversarial example, making the ensemble robust against transfer attacks even if each sub-model is individually non-robust. Only small clean accuracy drop is observed in the process. However, previous ensemble training methods are not efficacious in inducing such diversity and thus ineffective on reaching robust ensemble. We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features, and diversifies the adversarial vulnerability to induce diverse outputs against a transfer attack. The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks comparing to previous ensemble methods, and enables the improved robustness when more sub-models are added to the ensemble. The code of this work is available at https://github.com/zjysteven/DVERGE

Citations (100)

View on Semantic Scholar

Summary

The paper presents DVERGE, which diversifies adversarial vulnerabilities across CNN ensemble models to reduce transfer attack effectiveness.
It employs feature distillation and a round-robin training procedure to balance model robustness with high clean data accuracy.
Experimental results show reduced vulnerability overlap to 3-6% and improved ensemble performance against black-box attacks.

Analysis of DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles

The paper introduces DVERGE, a novel ensemble training technique designed to improve the robustness of convolutional neural networks (CNNs) against adversarial attacks. By focusing on the diversification of adversarial vulnerabilities within an ensemble of CNN models, DVERGE aims to enhance the robustness of such ensembles, especially against black-box transfer attacks, while maintaining high clean data accuracy.

Background

CNNs for image classification are susceptible to adversarial examples. These are small perturbations to input data that can lead to misclassification by the models. Such vulnerabilities often transfer across different models trained on the same dataset. Traditional adversarial training can improve model robustness but often at the cost of reduced accuracy on clean data. Conversely, existing ensemble methods do not sufficiently diversify the outputs of sub-models to prevent adversarial transfer attacks effectively.

Methodology

DVERGE introduces a unique approach to ensemble training by isolating and diversifying adversarial vulnerabilities across sub-models. The key component is a diversity metric, which employs feature distillation to identify and measure the overlap of adversarial vulnerabilities between models. More specifically, the method involves:

Distilling non-robust features for each sub-model, which are defined as those features that are sensitive to adversarial perturbations.
Employing a training objective that minimizes the overlap of adversarial vulnerabilities by maximizing the pairwise diversity across sub-models using these distilled features.
Implementing a round-robin training procedure to balance vulnerability diversification and classification accuracy across the ensemble.

Results

Experimental evaluations demonstrate the effectiveness of DVERGE:

DVERGE achieves significantly higher robustness against black-box transfer attacks compared to other ensemble methods like ADP and GAL, effectively reducing the transferability of adversarial examples between sub-models to as low as 3-6%.
Ensembles trained with DVERGE show improved robustness with an increasing number of sub-models, maintaining high clean accuracy and effectively mitigating the trade-off usually seen with adversarial training methods.
The paper explores combining DVERGE with adversarial training, showing that this hybrid approach can further balance clean accuracy and robustness, optimizing the ensemble's ability to learn both robust and diverse non-robust features.

Implications and Future Directions

This research provides a significant stride forward in developing robust ensemble models by focusing on vulnerability diversity rather than exclusively on adversarial robustness. The implications stretch across applications where model reliability against black-box adversarial attacks is crucial, such as in autonomous systems, security-sensitive AI applications, and more.

Future research could explore various avenues:

Extending DVERGE to other types of neural architectures beyond CNNs.
Examining the effects of DVERGE on different datasets to understand scalability and generalizability.
Investigating the interaction between layer-wise distillation used in DVERGE and the features learned at different depths of deep networks.

Overall, DVERGE offers a compelling framework that bridges clean data performance and adversarial robustness, suggesting a promising direction for future ensemble-based defenses in AI.

PDF Markdown

Related Papers

YouTube

Show All Videos