Wasserstein Adversarial Examples

Updated 10 October 2025

Wasserstein adversarial examples are inputs manipulated using the Wasserstein distance, capturing realistic perturbations like translation and blur beyond traditional ℓp-norms.
They employ advanced algorithms such as Projected Sinkhorn iterations and dual projection methods to efficiently generate attacks across image and time series domains.
Empirical results and certified defenses demonstrate significant drops in accuracy and prompt new research into semantic robustness and distributional threat models.

Wasserstein adversarial examples are inputs specifically crafted to induce misclassification in machine learning models, where the perturbation constraint is expressed not by $\ell_p$ -norms but by the Wasserstein distance—a measure derived from optimal transport theory. Unlike traditional pixelwise norms, the Wasserstein metric quantifies the minimal "cost" of transforming a source example into a perturbed one by optimally "moving" mass among components (e.g., pixels or time points). This paradigm more naturally encodes spatial or temporal invariances, human-perceived similarity, and structural fidelity. Recent advances in both attack and defense under the Wasserstein threat model span a broad range of domains, encompassing robust image classification, generative modeling, imitation learning, and certified verification.

1. Mathematical Formulation of the Wasserstein Threat Model

The Wasserstein threat model departs from classical $\ell_p$ -norm bounded perturbations by considering the Wasserstein distance $d_\mathcal{W}(x, x')$ defined over distributions induced by $x$ and its adversarially modified version $x'$ . The adversarial example $x'$ satisfies:

$\mathcal{B}_\mathcal{W}(x, \epsilon) = \{x' : d_\mathcal{W}(x, x') \leq \epsilon\}$

where $d_\mathcal{W}$ is calculated as the optimal transport cost with respect to a cost matrix $C$ , typically encoding spatial displacements or transformations. For images, this may involve moving pixel mass within local windows, capturing deformations such as translation, scaling, or blur that cannot be succinctly represented by $\ell_p$ -norms (Wong et al., 2019). For time series, the one-dimensional structure allows a closed-form characterization via cumulative distributions (Wang et al., 2023).

By relaxing the attack region to Wasserstein balls, the threat model encompasses "natural" and perceptually aligned perturbations, which are often imperceptible even when large under $\ell_p$ -metrics.

2. Algorithmic Methodologies for Generating Wasserstein Adversarial Examples

Adapting projected gradient descent (PGD) to the Wasserstein ball introduces unique optimization challenges. The projection step,

$\min_{x' : d_\mathcal{W}(x, x') \leq \epsilon} \|x' - w\|_2^2$

is nontrivial and requires bespoke solvers:

Projected Sinkhorn Iteration: A modified Sinkhorn scaling procedure is used to solve a regularized OT problem by iteratively updating dual variables, analogous to entropic regularized transport. Updates for dual vectors $(\alpha, \beta, \psi)$ are block-coordinate-wise, employing logarithmic and exponential transforms to compute efficient scaling and cost estimation (Wong et al., 2019). Local transport restriction (e.g., $k \times k$ image patches) reduces computational complexity from $O(n^2)$ to $O(nk^2)$ .
Dual Projection Operators and Frank–Wolfe Methods: Later works advance exact dual projection algorithms based on Lagrangian duality, allowing line-search or bisection for dual variables and simplex projection row-wise. Entropic regularization within the Frank–Wolfe linear minimization oracle (LMO) grants further efficiency and strong convexity, providing primal solutions with closed-form updates per batch (Wu et al., 2020).
Time Series Specialization: In one-dimensional domains, the Wasserstein projection leverages the closed-form expression of $d_\mathcal{W}$ , enabling efficient gradient-descent-based projection and, optionally, a two-step routine: first constraining the search via an $\ell_\infty$ ball, then refining within the Wasserstein ball to balance attack imperceptibility and efficacy (Wang et al., 2023).

Under these methodologies, adversarial examples can reduce the accuracy of high-performing ResNet models on CIFAR-10 from $\sim94\%$ to $3\%$ for Wasserstein radii $\epsilon = 0.1$ (Wong et al., 2019), or as low as $3.4\%$ for radius $0.005$ with exact projections (Wu et al., 2020).

3. Distributional and Semantic Extensions

The distributional robustness perspective generalizes adversarial attacks to perturb entire probability measures within a Wasserstein ball. The adversarial risk is expressed as:

$\sup_{Q: \mathcal{W}_c(P, Q) < \epsilon} \mathbb{E}_Q[f(z)]$

for a base distribution $P$ and measurable cost $c$ . This framework unifies and extends methods such as PGD-AT, TRADES, and MART by reinterpreting them as special cases where adversaries act pointwise rather than over distributions (Bui et al., 2022, Bai et al., 2023).

Novel constructions such as the Internal Wasserstein Distance (IWD) compare empirical distributions of local features or patches, allowing generation of "semantically similar" adversarial examples. IWD-based attacks maximize classifier loss while ensuring the Wasserstein distance between internal feature sets (patch representations) remains small, producing diverse adversaries that respect the data manifold more closely than those derived via $\ell_p$ -norms (Wang et al., 2021).

Layerwise or high-level feature perturbations further extend the approach. By perturbing intermediate decoder layers in generative autoencoder architectures and constraining Wasserstein distance in image space, adversarial examples exhibiting semantic alteration (such as added texture, structure, or color shifts) are achieved in a targeted, interpretable manner (Čermák et al., 2021).

4. Certified Robustness and Verification

Defense against Wasserstein adversaries centers on both empirical approaches (adversarial training) and certified techniques. Certified robustness via Wasserstein Smoothing employs the relation that 1-Wasserstein distance between probability distributions of images can be upper bounded by the $\ell_1$ norm in a "flow" domain. Here, a local flow plan $\delta$ is injected with Laplacian noise:

$f^{WS}(x) = \mathbb{E}_{\delta \sim\, \mathrm{Laplace}(0, \sigma)}[f(\Delta(x, \delta))]$

where $\Delta(x, \delta)$ denotes the image after flow application. Robustness is then certified using bounds that guarantee invariance of the classifier's top prediction within a Wasserstein ball, leveraging prior theory from randomized smoothing in $\ell_1$ space (Levine et al., 2019).

A complementary verification technique “lifts” the certification and attack problem into the flow domain and applies convex polytope or $\ell_1$ -ball methods. When the relevant layerwise affine mappings and feasibility constraints are enforced, this approach provides either complete or incomplete certification for the absence (or existence) of adversarial examples within Wasserstein balls (Wegel et al., 2021).

5. Empirical Results and Applications

Substantial empirical evidence demonstrates the effectiveness of Wasserstein-based adversarial examples and defenses:

On CIFAR-10, exact Wasserstein attacks (dual projection or Frank–Wolfe) reduce model accuracy to $\sim3\%$ for moderate $\epsilon$ , outperforming previous approximate projections which yield only $65.6\%$ (Wu et al., 2020).
Adversarial training with Wasserstein examples increases robustness to geometric deformations not captured by $\ell_p$ -norm constraints, with adversarial accuracy improving from $3\%$ (no defense) to $76\%$ after PGD-based adversarial training (Wong et al., 2019).
In time series (ECG) classification, Wasserstein adversaries achieve $100\%$ attack success rate on certain datasets, using much smaller and more imperceptible perturbations than $\ell_\infty$ -based attacks (Wang et al., 2023).
In unsupervised generative modeling, Wasserstein-based objectives allow drastic reductions in model size while maintaining or improving sample quality (e.g., a $20\times$ reduction in required CNNs compared to previous INN methods) (Lee et al., 2017).
Certified defense strategies can guarantee nonexistence of adversaries within provable Wasserstein radii, although certified radii are often small for complex data modalities (Levine et al., 2019, Wang et al., 2023).

6. Extensions, Limitations, and Open Directions

Research highlights several extensions and unresolved issues:

The adaptation of Wasserstein adversarial example frameworks to non-image data such as univariate and multivariate time series, point clouds, and scientific simulations is an expanding area (Erdmann et al., 2018, Wang et al., 2023).
Certified robustness techniques remain limited by computational constraints—exact certificates are feasible for lower-dimensional or specially structured transformations, while high-dimensional image or flow domains often require incomplete or conservative relaxations (Wegel et al., 2021).
Adversarial training under Wasserstein or distributional threat models yields models with improved generalization to previously unseen or unmodeled distributional shifts, but practical scaling and trade-offs (e.g., selectivity and statistical efficiency) are still under paper (Bui et al., 2022, Bai et al., 2023).
The interplay between metric uncertainty, model compactness, and robustness indicates lower bounds: universal robustness against an unknown class of transport metrics may require capacity-prohibitive models or additional cryptographic primitives (Döttling et al., 2020).
Semantically-aware threat models (internal Wasserstein, layerwise transport) offer increased attack diversity and challenge standard defense mechanisms, necessitating improved strategies for semantic robustness (Wang et al., 2021, Čermák et al., 2021).

7. Broader Impact and Research Directions

Wasserstein adversarial examples are reshaping the understanding of machine learning robustness by exposing model vulnerabilities to geometrically meaningful, theoretically principled, and perceptually aligned perturbations. The integration of optimal transport metrics into adversarial training, certified verification, and generative adversarial learning broadens the scope of robust ML, demanding both deeper theoretical foundations and practical algorithmic innovation. Current research continues to address computational bottlenecks (exact projection, scalable certification), domain extensions (multimodal and distributional settings), and adaptive reward/cost functions tailored for both robust prediction and generation.

Papers and code repositories relevant for practical implementations include (Wong et al., 2019, Wu et al., 2020, Levine et al., 2019, Wegel et al., 2021, Wang et al., 2023), and (Bai et al., 2023).