Fast Gradient Sign Method (FGSM)

Updated 31 December 2025

Fast Gradient Sign Method is a one-step, white-box adversarial attack that perturbs inputs by adding scaled gradient signs to maximize misclassification.
It uses a single backward pass with a controlled ℓ∞-norm perturbation, ensuring efficient and bounded modifications.
FGSM serves as a foundation for iterative and non-sign variants that enhance transferability and robustness in diverse applications.

The Fast Gradient Sign Method (FGSM) is a one-step, white-box adversarial attack technique for neural networks, leveraging the linearity of modern architectures to generate adversarial examples by applying perturbations in the direction of the sign of the gradient of the loss function with respect to the input. FGSM is distinguished by its efficiency—a single backward pass suffices for each adversarial sample—and by its guarantee that the resulting perturbation has bounded ℓ∞ norm. FGSM forms the basis for numerous extensions and is widely used both as an attack and as a component in adversarial training pipelines to enhance robustness.

1. Mathematical Definition and Core Algorithm

Given a model $f$ parameterized by $\theta$ , an input $x$ , true label $y$ , and a loss function $J(\theta, x, y)$ (often cross-entropy), FGSM computes adversarial examples as follows:

$x_{\mathrm{adv}} = x + \epsilon \cdot \mathrm{sign}\left(\nabla_x J(\theta, x, y)\right)$

where $\epsilon$ is the perturbation budget controlling the $\ell_\infty$ -norm of the perturbation. The goal is to maximize the loss with respect to $x$ within a small neighborhood, so as to induce misclassification:

$\max_{\|\delta\|_\infty \leq \epsilon} J(\theta, x + \delta, y)$

This closed-form solution requires a single gradient computation and has been empirically shown to be highly effective against deep models, especially in white-box scenarios (Yilmaz, 2020, Musa et al., 2022, Milton, 2018, Cheng et al., 2021).

For targeted attacks (where the adversarial goal is to move predictions toward a specified class $y_\mathrm{target}$ ), the perturbation is inverted:

$x_{\mathrm{adv}} = x - \epsilon \cdot \mathrm{sign}\left(\nabla_x J(\theta, x, y_{\mathrm{target}})\right)$

2. Theoretical Motivation, Bias, and Limitations

The motivation for the $\mathrm{sign}(\cdot)$ operation under the $\ell_\infty$ constraint arises from maximizing a linearized loss:

$J(x+\delta, y) \approx J(x, y) + \nabla_x J(x, y)^\top \delta$

Maximizing the inner product with $\|\delta\|_\infty \leq \epsilon$ leads to the closed form solution. However, FGSM induces a directional bias: because $\mathrm{sign}(g)$ is a vector of $\pm 1$ (or $0$), it ignores the magnitudes of gradient components. This means the step is not perfectly aligned with the true gradient:

$\cos \theta = \frac{\|g\|_1}{\sqrt{D} \|g\|_2}, \quad \text{with } D = \text{input dimension}$

For most high-dimensional gradients, this angle is strictly less than zero, reflecting a suboptimal ascent direction—which can slow progress in iterative attacks or yield less transferable perturbations (Cheng et al., 2021, Han et al., 2023).

Recent work analyzes the impact of this bias and introduces refinements. Fast Gradient Non-sign Methods (FGNM) replace the sign operation with a scaled gradient step, adapting the per-step scaling to maintain $\ell_\infty$ constraints while improving directional alignment. The N-variant ( $\zeta_t = 1/\|g_t\|_\infty$ ) yields stepwise updates exactly parallel to the true gradient, while K-variant further adapts the scaling per coordinate (Cheng et al., 2021). These non-sign approaches improve black-box transferability and attack efficacy.

3. Advanced Iterative Variants and Transferability

Several extensions build on FGSM to improve attack success and transfer. Notable variants include:

Iterative FGSM (I-FGSM): Applies FGSM in multiple small steps with projection onto the valid $\ell_\infty$ ball,

$x^{(n+1)} = \mathrm{Clip}_{x, \epsilon} \left\{ x^{(n)} + \alpha \cdot \mathrm{sign}\left(\nabla_x J(x^{(n)}, y)\right) \right\}$

Momentum Iterative FGSM (MI-FGSM): Introduces a momentum term to stabilize the direction of updates and avoid bad local maxima,

$g^{(n+1)} = \mu g^{(n)} + \frac{\nabla_x J(x^{(n)}, y)}{\|\nabla_x J(x^{(n)}, y)\|_1}$

Diverse Input I-FGSM (DI²-FGSM): Incorporates random input transformations at each step to produce more transferable perturbations (Milton, 2018).

These methods can be composed (e.g., M-DI²-FGSM) and are empirically shown to outperform basic FGSM/I-FGSM in black-box and transfer settings.

4. Sampling-Based and Non-sign Generalizations

The inefficiency of the sign function in capturing true gradient structure motivates advanced methods:

Sampling-based Fast Gradient Rescaling Method (S-FGRM): Replaces the sign operator with a data-driven rescaling process. The update uses:

$\operatorname{rescale}(g) = c \left[\mathrm{sign}(g) \odot \sigma(\operatorname{norm}(\log_2|g|))\right]$

with $c$ a scaling constant, $\sigma$ the elementwise sigmoid, and $\operatorname{norm}$ normalization. This preserves some relative magnitude information.

Depth First Sampling (DFSM): Stabilizes updates by averaging gradients across a neighborhood in input space, reducing local noise and further improving transferability (Han et al., 2023).

Empirical ablations show that S-FGRM outperforms MI-FGSM and related baselines, especially in black-box attack rates and robustness to input transformations and defenses.

Fast Gradient Non-sign Methods (FGNM): As reviewed above, optimize step directionality by matching the $\ell_\infty$ budget without coarsening to sign-only updates. These methods achieve state-of-the-art transferability in untargeted and targeted black-box attack settings (Cheng et al., 2021).

5. Applications and Empirical Results

FGSM and its variants have been implemented in a wide range of domains:

Medical imaging: One-step FGSM can reduce CNN accuracy on mammographic classification from 70% to chance with visually imperceptible perturbations (SSIM > 0.9 for small $\epsilon$ ) (Yilmaz, 2020).
Face recognition: Simple FGSM attacks degrade state-of-the-art VGG-like face models substantially, with untargeted attacks reaching misclassification rates that depend strongly on model training duration (Musa et al., 2022).
Malware detection: The FGAM method iteratively injects bytes into PE malware to produce image-based adversarial examples that evade deep detectors, achieving up to 91.6% attack success with only 10% byte perturbation and preserving program functionality (Li et al., 2023).
Robust training and regularization: FGSM-derived adversarial training is known to improve model robustness by enforcing stability of predictions in small local neighborhoods.

Representative results are summarized in the following table:

Domain	Model/Setting	Clean Acc.	Post-FGSM Acc.	SSIM / Imperceptibility
Mammography (Yilmaz, 2020)	2-layer CNN, ε=0.01–0.1	~70%	40–70% (ε=0.01), ~50% (ε=0.1)	≥0.9 (imperceptible, small ε)
Face Recognition (Musa et al., 2022)	VGG16+Dense Head, 10–50 epochs	85–98%	39–90% (untargeted, ε=0.02–0.04)	N/A
Malware (Li et al., 2023)	DenseNet-121 on image PE	94.1%	8–91.6% (BARAF–FGAM, r=10%)	N/A

6. Theoretical Properties and Regularization Perspective

The regularization effect of FGSM and adversarial training is formalized via its connection to L1 (LASSO) penalties. In a GLM setting,

$Q_n(\beta) \approx \text{Empirical Risk} - \epsilon \sum_{i=1}^n |e_i| \|\beta\|_1$

where $e_i$ is the model residual, and the additional term is a data-weighted LASSO penalty (Zuo, 2018). Asymptotically, FGSM-trained estimators enjoy $\sqrt{n}$ -consistency and sparsity under suitable conditions. However, unless the noise is "sign-neutral", an extra bias term arises due to imbalance in $\mathrm{sign}(e_i)$ relative to $x_i$ .

This suggests that FGSM regularization benefits generalization and robustness but may introduce bias if adversarial directions align unevenly with data covariance.

7. Limitations, Defenses, and Future Directions

FGSM's main limitations are its reliance on local linear approximation and the coarseness of sign-based perturbation, which limit its efficacy on non-linear or highly regularized models and can result in suboptimal transfer, especially for strong black-box defenses (Cheng et al., 2021, Han et al., 2023).

Iterative, momentum-based, or input-transformation extensions can mitigate these weaknesses but increase computational cost.
Sampling- and rescaling-based generalizations currently offer the best transferability among efficient attacks.
Defensive countermeasures include adversarial training, input processing (randomization, compression), robust model architectures, and ensemble detection pipelines (Li et al., 2023, Musa et al., 2022).
Future research aims at reducing the computational overhead of advanced non-sign or sampling approaches while retaining (or improving) transferability, and at developing adaptive or learned perturbation rescaling mechanisms.

A plausible implication is that further theoretical study of the gradient-magnitude distribution and its effect on attack success will dictate the next generation of adaptive adversarial methods.

References:

(Yilmaz, 2020): "Practical Fast Gradient Sign Attack against Mammographic Image Classifier" (Musa et al., 2022): "Attack Analysis of Face Recognition Authentication Systems Using Fast Gradient Sign Method" (Milton, 2018): "Evaluation of Momentum Diverse Input Iterative Fast Gradient Sign Method (M-DI2-FGSM) Based Attack Method on MCS 2018 Adversarial Attacks on Black Box Face Recognition System" (Li et al., 2023): "FGAM: Fast Adversarial Malware Generation Method Based on Gradient Sign" (Cheng et al., 2021): "Fast Gradient Non-sign Methods" (Han et al., 2023): "Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial Attacks" (Zuo, 2018): "Regularization Effect of Fast Gradient Sign Method and its Generalization"