- The paper introduces the Wasserstein distance as a novel threat model that captures natural image transformations beyond conventional ℓ_p norm perturbations.
- It develops efficient Projected Sinkhorn Iterations to project onto Wasserstein balls, enabling rapid approximation for generating adversarial examples.
- Empirical results on CIFAR10 show accuracy plummeting from 94.7% to 3% and adversarial training with PGD improving robustness to 76%.
Overview of Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
The paper "Wasserstein Adversarial Examples via Projected Sinkhorn Iterations" by Wong, Schmidt, and Kolter presents a novel approach in the domain of adversarial machine learning, focusing on the development of adversarial examples through a new threat model based on the Wasserstein distance. This work addresses the limitations of the prevalent approach that relies predominantly on adversarial examples defined by ℓp norm-bounded perturbations, which fail to capture the full spectrum of practical image manipulations inherent in natural adversarial scenarios.
Core Contributions
- Introduction of Wasserstein Distance as a Threat Model: The authors propose the Wasserstein distance as a more intuitive and comprehensive metric for generating adversarial examples. Unlike traditional ℓp norm-based approaches, Wasserstein distances reflect natural transformations such as scaling, rotation, and translation more robustly by considering the cost of moving pixel mass.
- Development of Projected Sinkhorn Iterations: The paper introduces an efficient algorithm to project onto Wasserstein balls – the key mathematical construct underpinning the adversarial example generation in this new framework. This is achieved through a modified iteration based on Sinkhorn’s method, enabling rapid approximation that facilitates integration into adversarial training pipelines.
- Empirical Evaluation and Competitive Performance: Utilizing the CIFAR10 dataset, the authors demonstrate that their approach can significantly degrade model accuracy from 94.7% to 3% by moving just 10% of the image mass by one pixel. This underscores the vulnerability of models to adversarial attacks crafted within the Wasserstein framework. They also show that adversarial training with PGD under this threat model can improve adversarial accuracy to 76%, indicating a path forward for robust model development.
Theoretical and Practical Implications
- Conceptual Advance in Robustness Evaluation:
This work opens a new research trajectory, incentivizing the exploration of adversarial robustness grounded in metrics that capture the invariances presumed to exist in classifiers. By formalizing the use of convex metrics like Wasserstein distance, the paper enriches the theoretical understanding of adversarial vulnerabilities.
- Implications for Robust and Certifiably Secure Models:
The introduction of the Wasserstein distance poses significant implications for certified defenses. The authors note current certified defenses, mostly aligned with ℓ∞ metrics, cannot be directly adapted to the Wasserstein setting. Developing models robust to Wasserstein distortions warrants novel architectural and theoretical approaches.
Future Directions
The introduction of the Wasserstein framework suggests several promising future directions. Researchers could extend the current results to investigate other forms of convex adversarial perturbations beyond ℓp norms, fostering a deeper examination of natural data distribution shifts and refined threat models. Further, the creation of provable guarantees against Wasserstein-based attacks remains an open challenge. Bridging this gap requires innovative strategies potentially leveraging advances in transport theory and entropy-based regularization. As these efforts proceed, projects like this one will be essential in steering the community toward the commercialization of AI systems resilient to more diverse and realistic adversarial conditions.
In conclusion, the paper provides not just an alternative method for generating adversarial examples but also a complementary perspective on understanding model robustness, thereby contributing meaningfully to the broader field of adversarial machine learning.