- The paper introduces a feature scattering-based adversarial training method that generates unsupervised, collaborative adversarial examples to overcome label leaking.
- The paper leverages optimal transport to maximize the feature matching distance between clean and perturbed samples within a bilevel optimization framework.
- Experimental results on CIFAR10, CIFAR100, and SVHN show significant improvements in adversarial robustness, with CIFAR10 accuracy increasing by 25.6% over previous methods.
Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training
This paper presents an innovative feature scattering-based adversarial training method to enhance model robustness against adversarial attacks. Traditional adversarial training methods utilize a supervised scheme for generating adversarial samples, often encountering issues like label leaking. The proposed approach distinguishes itself by adopting an unsupervised methodology to generate adversarial images via feature scattering in the latent space, effectively circumventing the challenge of label leaking. Moreover, this approach emphasizes collaborative perturbation generation by considering inter-sample relationships, as opposed to treating each sample in isolation.
Main Contributions
The paper makes several contributions to improve adversarial training:
- Novel Approach: It introduces a feature-scattering technique for creating adversarial images in an unsupervised, collaborative fashion. This method diverges from the traditional minimax formulation common in adversarial training.
- Bilevel Optimization: The research explores an adversarial training formulation that fits within a broader category of bilevel optimization problems.
- Robustness Analysis: Through extensive experimentation on various datasets, the paper analyzes the effectiveness of feature scattering in comparison to state-of-the-art adversarial training techniques.
Methodology
The feature scattering method relies on maximizing the feature matching distance between empirical distributions derived from clean and perturbed samples. The optimal transport (OT) distance serves as the metric for this comparison, leveraging ground features extracted from the data. This technique seeks to maintain the inter-sample structural integrity while generating adversarial perturbations, thus averting the pitfalls associated with label-guided adversarial examples that may deviate from the data manifold.
Experimental Results
The efficacy of the proposed approach is validated on benchmark datasets such as CIFAR10, CIFAR100, and SVHN:
- On CIFAR10, the proposed method achieves a significant accuracy of 70.5% under a standard 20-step PGD attack, outperforming prior methods by notable margins (e.g., improving over the Madry method by 25.6%).
- Experiments on CIFAR100 and SVHN further consolidate the robustness of the proposed approach, demonstrating substantial improvements in adversarial accuracy against white-box attacks compared to existing models.
Implications and Future Directions
The implications of this research are two-fold. Practically, the feature scattering method facilitates the training of models that are inherently more robust to adversarial attacks without incurring the time and computational resource penalties associated with traditional adversarial training iterations. Theoretically, it opens avenues for leveraging inter-sample features more effectively, encouraging exploration in collaborative perturbation techniques across machine learning domains.
Future research can further refine this unsupervised adversarial sample generation approach, potentially integrating other structural learning paradigms and exploring its applications in various domains beyond image classification, such as object detection and natural language processing. Additionally, investigating the theoretical bounds of adversarial robustness achievable through such collaborative methods can yield deeper insights into the limitations and capabilities of current adversarial defense strategies.