- The paper introduces a unified framework that integrates Wasserstein distributional robustness with adversarial training methods.
- It develops novel risk and cost functions to extend traditional approaches from pointwise adversaries to broader distributional shifts.
- Experiments on MNIST, CIFAR10, and CIFAR100 demonstrate significant improvements in adversarial robustness over standard techniques.
A Unified Wasserstein Distributional Robustness Framework for Adversarial Training
Introduction
The article "A Unified Wasserstein Distributional Robustness Framework for Adversarial Training" explores the nexus of adversarial training (AT) and distributional robustness (DR) through the application of the Wasserstein metric. Despite advancements in deep neural networks (DNNs), their vulnerability to adversarial attacks remains a concern. To counteract this, AT methods, which incorporate adversarial examples during training, are one of the predominant approaches for enhancing robustness. However, traditional AT methods such as PGD-AT and TRADES focus on pointwise adversaries, which limit their robustness potential. By leveraging the Wasserstein distance, the paper proposes a unified framework that connects DR with state-of-the-art (SOTA) AT methods and extends their capabilities beyond single data perturbations to distributional shifts.
Theoretical Framework
The proposed framework introduces a new cost function associated with the Wasserstein distance and a series of novel risk functions. This theoretical foundation enables the derivation of distributional robustness versions of prevalent AT methods, suggesting that these versions have a broader applicability than their standard forms. The underlying principle is to explore adversarial effects from entire distributional perturbations, enhancing the robustness of DNN classifiers in more generalized scenarios. The mathematical formulation presented integrates the concepts of optimal transport, primal and dual form transformations, and Wasserstein risk minimization, providing a robust theoretical underpinning for the framework.
Implementation Details
Risk Functions and Cost Function
The paper delineates a series of risk functions that correspond to popular AT methods:
- UDR-PGD: Combines cross-entropy loss with a Wasserstein distributional cost-based term.
- UDR-TRADES: Incorporates KL divergence between original and perturbed data outputs alongside the standard TRADES robustness objective.
- UDR-MART: Extends the MART framework by considering label distribution and margin-based robustness criteria.
The cost function is a pivotal component in this framework, described as a modified Wasserstein distance. It is refined to provide a smoother, more adaptable proximity criterion which allows for effective gradient-based optimization.
Algorithmic Framework and Training Procedures
The framework proposes a pseudocode-based algorithm that iteratively adjusts model parameters (θ) and learning rate adaptations (λ) to identify and optimize adversarial examples. The algorithm employs stochastic gradient descent, where λ dynamically adapts based on the distributional shift observed during training. This adaptive mechanism empowers the framework to consider both local and global information from benign examples, enhancing its efficacy in generating robust models.
Experimental Results
Extensive experiments conducted on MNIST, CIFAR10, and CIFAR100 datasets demonstrate the superior adversarial robustness of models trained under the proposed framework compared to traditional AT methods. Evaluations employed attack methods like PGD, Auto-Attack, and B{content}B to critique model performance. The results showed consistent improvements in adversarial accuracy across various strength levels of perturbations, highlighting the generalization capabilities of the distributional robustness approach.
Conclusion
The paper successfully establishes a connection between Wasserstein distributional robustness and adversarial training methodologies, paving the way for a unified approach that bolsters DNN robustness against adversarial perturbations. By transitioning from pointwise adversaries to distributional shifts, this framework not only extends the theoretical landscape of adversarial training but also provides practical improvements in robust model development. Future directions could involve exploring alternate metrics of distributional distance or integrating domain-specific considerations into the robustness framework for tailored applications.