Wasserstein Adversarial Examples via Projected Sinkhorn Iterations (1902.07906v2)

Published 21 Feb 2019 in cs.LG and stat.ML

Abstract: A rapidly growing area of work has studied the existence of adversarial examples, datapoints which have been perturbed to fool a classifier, but the vast majority of these works have focused primarily on threat models defined by $\ell_p$ norm-bounded perturbations. In this paper, we propose a new threat model for adversarial attacks based on the Wasserstein distance. In the image classification setting, such distances measure the cost of moving pixel mass, which naturally cover "standard" image manipulations such as scaling, rotation, translation, and distortion (and can potentially be applied to other settings as well). To generate Wasserstein adversarial examples, we develop a procedure for projecting onto the Wasserstein ball, based upon a modified version of the Sinkhorn iteration. The resulting algorithm can successfully attack image classification models, bringing traditional CIFAR10 models down to 3% accuracy within a Wasserstein ball with radius 0.1 (i.e., moving 10% of the image mass 1 pixel), and we demonstrate that PGD-based adversarial training can improve this adversarial accuracy to 76%. In total, this work opens up a new direction of study in adversarial robustness, more formally considering convex metrics that accurately capture the invariances that we typically believe should exist in classifiers. Code for all experiments in the paper is available at https://github.com/locuslab/projected_sinkhorn.

Citations (205)

View on Semantic Scholar

Summary

The paper introduces the Wasserstein distance as a novel threat model that captures natural image transformations beyond conventional ℓ_p norm perturbations.
It develops efficient Projected Sinkhorn Iterations to project onto Wasserstein balls, enabling rapid approximation for generating adversarial examples.
Empirical results on CIFAR10 show accuracy plummeting from 94.7% to 3% and adversarial training with PGD improving robustness to 76%.

Overview of Wasserstein Adversarial Examples via Projected Sinkhorn Iterations

The paper "Wasserstein Adversarial Examples via Projected Sinkhorn Iterations" by Wong, Schmidt, and Kolter presents a novel approach in the domain of adversarial machine learning, focusing on the development of adversarial examples through a new threat model based on the Wasserstein distance. This work addresses the limitations of the prevalent approach that relies predominantly on adversarial examples defined by $\ell_p$ norm-bounded perturbations, which fail to capture the full spectrum of practical image manipulations inherent in natural adversarial scenarios.

Core Contributions

Introduction of Wasserstein Distance as a Threat Model: The authors propose the Wasserstein distance as a more intuitive and comprehensive metric for generating adversarial examples. Unlike traditional $\ell_p$ norm-based approaches, Wasserstein distances reflect natural transformations such as scaling, rotation, and translation more robustly by considering the cost of moving pixel mass.
Development of Projected Sinkhorn Iterations: The paper introduces an efficient algorithm to project onto Wasserstein balls – the key mathematical construct underpinning the adversarial example generation in this new framework. This is achieved through a modified iteration based on Sinkhorn’s method, enabling rapid approximation that facilitates integration into adversarial training pipelines.
Empirical Evaluation and Competitive Performance: Utilizing the CIFAR10 dataset, the authors demonstrate that their approach can significantly degrade model accuracy from 94.7% to 3% by moving just 10% of the image mass by one pixel. This underscores the vulnerability of models to adversarial attacks crafted within the Wasserstein framework. They also show that adversarial training with PGD under this threat model can improve adversarial accuracy to 76%, indicating a path forward for robust model development.

Theoretical and Practical Implications

Conceptual Advance in Robustness Evaluation:

This work opens a new research trajectory, incentivizing the exploration of adversarial robustness grounded in metrics that capture the invariances presumed to exist in classifiers. By formalizing the use of convex metrics like Wasserstein distance, the paper enriches the theoretical understanding of adversarial vulnerabilities.

Implications for Robust and Certifiably Secure Models:

The introduction of the Wasserstein distance poses significant implications for certified defenses. The authors note current certified defenses, mostly aligned with $\ell_\infty$ metrics, cannot be directly adapted to the Wasserstein setting. Developing models robust to Wasserstein distortions warrants novel architectural and theoretical approaches.

Future Directions

The introduction of the Wasserstein framework suggests several promising future directions. Researchers could extend the current results to investigate other forms of convex adversarial perturbations beyond $\ell_p$ norms, fostering a deeper examination of natural data distribution shifts and refined threat models. Further, the creation of provable guarantees against Wasserstein-based attacks remains an open challenge. Bridging this gap requires innovative strategies potentially leveraging advances in transport theory and entropy-based regularization. As these efforts proceed, projects like this one will be essential in steering the community toward the commercialization of AI systems resilient to more diverse and realistic adversarial conditions.

In conclusion, the paper provides not just an alternative method for generating adversarial examples but also a complementary perspective on understanding model robustness, thereby contributing meaningfully to the broader field of adversarial machine learning.

PDF Markdown

Related Papers

GitHub

GitHub - locuslab/projected_sinkhorn (85 stars)