Fast Gradient Non-sign Methods (FGNM)
- FGNM is defined as a set of algorithms that use full-vector, non-sign gradient updates to preserve directional fidelity and overcome the biases of sign-based methods.
- They encompass fixed-scale and adaptive-scale attack variants, as well as advanced convex optimization and linear system solvers, offering superior convergence and robustness.
- FGNM methods maintain low computational overhead while significantly improving adversarial transferability, convergence rates, and numerical stability in various applications.
Fast Gradient Non-sign Methods (FGNM) encompass a spectrum of optimization and attack algorithms that replace the standard coordinate-wise sign compression of gradient directions with full-vector, non-sign manipulations. These methods are formulized to preserve or enhance directional fidelity to the true gradient, improve convergence, or increase practical efficacy in machine learning, linear system solving, and adversarial attack generation. FGNM approaches are distinct from sign-based updates in both theoretical properties and empirical outcomes, finding application in convex optimization, adversarial robustness, and large-scale numerical linear algebra.
1. Theoretical Foundations and Motivation
Fast Gradient Non-sign Methods are rooted in the observation that quantizing gradients to their coordinate signs, as in SignSGD and Fast Gradient Sign Method (FGSM), induces a directional bias. For example, in adversarial attacks under an constraint, the optimal first-order gain is reduced because is not fully aligned with the true gradient . The cosine of the angle between the sign-based perturbation and the original gradient satisfies
unless has entries of equal magnitude (Cheng et al., 2021). In the context of convex optimization, sign-based methods lose curvature data and typically degrade to convergence, lacking the second-order efficiency present in accelerated or adaptive methods (Cheng et al., 2016).
FGNM strategies instead rescale or otherwise preserve the full gradient vector, thereby maintaining the directional and magnitude information required for higher-order accuracy, effective alignment with low-curvature directions, and improved empirical robustness.
2. Algorithmic Structures and Principal Variants
FGNM frameworks are instantiated in multiple domains:
- Adversarial Attack Generation: FGNM replaces the sign operation in classical -bounded attack methods by projecting or normalizing the gradient to retain its true direction.
- Fixed-scale variant (N): Set the step as
achieving perfect alignment (cosine = 1) with and saturating the norm constraint (Cheng et al., 2021).
- Adaptive-scale variant (K): Select scaling factor based on ranked values, yielding
with subsequent box clipping to the constraint. This allows controlled trade-off between alignment and norm magnitude.
- Smooth and Composite Convex Optimization: Methods such as FLAG and FLARE eschew sign-compression, employing the full gradient in conjunction with adaptive diagonal rescaling matrices. FLAG combines a proximal gradient step, AdaGrad-style preconditioning, mirror descent, and Nesterov-style coupling (Cheng et al., 2016).
- Proximal and mirror steps operate in the adaptively rescaled geometry:
ensuring efficient adaptation to curvature and achieving accelerated convergence.
- Symmetric Linear System Solvers: The family of FGNM algorithms for , including AOA (Asymptotically-Optimal with Alignment), MGA (Minimal Gradient with Alignment), and MGC (Minimal Gradient with Constant Step), periodically applies shorter or constant steps in place of the Cauchy step. These variants focus spectral energy on the critical eigenspaces, eliminating the “zig-zag” behavior of classical steepest descent (Zou et al., 2019).
- For instance, MGA computes:
and interleaves it with harmonic mean steps to enforce alignment.
3. Analytical Properties and Convergence Behavior
In supervised learning and adversarial contexts, by restoring the perturbation direction to be parallel with the gradient, FGNM methods maximize the first-order Taylor gain under norm constraints. This correction addresses the inefficiency of sign-based attacks, as demonstrated theoretically and empirically (Cheng et al., 2021).
In convex optimization, FLAG and FLARE attain an objective gap bound of
where reflects adaptive geometry from accumulated gradient norms (Cheng et al., 2016). By contrast, sign-based and non-adaptive methods lack such guarantees.
FGNM variants for symmetric linear systems are shown to exhibit -linear convergence under Dai’s Property A, ensuring that at a linear rate, and maintaining bounded step sizes for stability (Zou et al., 2019). Spectral alignment analysis demonstrates that periodic use of constant or harmonic mean steps drives search directions towards dominant eigenspaces, unlike the alternating two-dimensional cycles typical in sign-based, steepest-descent, or minimal-gradient approaches.
4. Empirical Performance and Benchmarks
Adversarial Attack Transferability
FGNM outperforms sign-based methods in both untargeted and targeted black-box adversarial settings:
- I-FGSM I-FGNM ("N"): \% (avg. over standard models),
- I-FGSM I-FGNM ("K"): \%,
- SI-FGSM SI-FGNM ("K"): \%,
- Targeted attacks against defense models: up to \% (Cheng et al., 2021).
Perturbations generated by FGNM align much more closely with the gradient and can be tuned for norm using the "K" parameter.
Composite Convex Optimization
Across six data sets with -regularized and box-constrained losses, FLAG and FLARE match or surpass FISTA in both objective reduction and test accuracy. In classification on 20 Newsgroups (box-constrained), after 1000 iterations:
| Method | Objective Gap | Test Accuracy (%) |
|---|---|---|
| FISTA | 82.3 | |
| FLAG | 84.1 | |
| FLARE | 84.2 |
FLARE typically achieves the wall-clock minimum due to efficient iteration acceptance (Cheng et al., 2016).
Linear System Solving
In numerical experiments with random SPD and finite-difference tridiagonal matrices, MGC consistently outperforms Barzilai-Borwein and other two-point methods, especially for ill-conditioned problems (). On large, real-world problems (e.g., ), MGC surpasses conjugate gradients in both iteration count and run time. MGC also exhibits improved numerical stability and maintains performance where Krylov methods degrade (e.g., under system perturbation) (Zou et al., 2019).
5. Computational Cost and Algorithmic Simplicity
FGNM implementations introduce minimal additional computational overhead compared to sign-based analogues:
- In attack generation, the step cost is for fixed-scale normalization, and at most for the adaptive-scale variant due to sorting (Cheng et al., 2021).
- FLAG has per-iteration cost of , comparable to FISTA, while FLARE often keeps the number of expensive proximal calls per-iteration identical to FISTA (Cheng et al., 2016).
- Linear solver variants incur one or two sparse matrix-vector products and a small number of inner products per iteration (Zou et al., 2019).
This efficiency facilitates integration into existing optimization and attack pipelines, justifying their use in high-dimensional or real-time settings.
6. Distinction from Sign-Based and Related Methods
FGNM are characterized by the avoidance of coordinate-wise sign compression:
- They preserve full directional information, adaptive geometry, and enable the use of momentum (linear coupling, alignment steps).
- Sign-based approaches (SignSGD, FGSM) forgo curvature and restrict updates to axis-aligned directions, reducing both convergence rates and attack transferability.
- Empirically, sign-based methods exhibit degraded performance as curvature variance increases or as model defenses are strengthened; FGNM maintain robust performance via geometry-aware, non-quantized steps (Cheng et al., 2016, Cheng et al., 2021).
FGNM thus represent a distinct methodological class with theoretical, practical, and computational advantages in several domains.