Papers
Topics
Authors
Recent
Search
2000 character limit reached

Instance-Specific Gradient Rescaling

Updated 4 February 2026
  • Instance-Specific Gradient Rescaling is a technique that resizes gradient coordinates adaptively for each data instance, preserving the local structure of the gradient.
  • The S-FGRM method replaces the standard sign function with an instance-driven scaling based on log-compression, normalization, and sigmoid transforms to capture both direction and magnitude.
  • Empirical results demonstrate that this approach enhances black-box adversarial attack transferability by up to 30–50 percentage points compared to traditional methods.

Instance-Specific Gradient Rescaling refers to the class of methods in which the per-iteration update direction in gradient-based optimization is adaptively and non-uniformly rescaled based on the local structure of the gradient for each data instance. This technique mitigates the information loss caused by globally uniform mapping functions such as the element-wise sign function, and instead preserves instance-specific coordinate importance. It is most prominently utilized in the context of adversarial attacks on deep neural networks, where transferability and the geometric fidelity of the perturbation direction are critical (Han et al., 2023).

1. Motivation and Limitations of Sign-Based Rescaling

The default approach in many adversarial attack algorithms, such as FGSM, I-FGSM, MI-FGSM, and NI-FGSM, uses the element-wise sign function to map a raw gradient xJ(x,y)\nabla_x J(x, y) onto the boundary of an \ell_\infty norm ball. While computationally expedient, the sign function discards magnitude information and may fail to align the perturbation direction with the most informative axes for the input instance. The consequence is a deviation between the original gradient and the noise added, leading to suboptimal estimates of the true direction of maximal loss increase. This degradation is especially detrimental to transferability in black-box scenarios (Han et al., 2023).

2. Data-Driven Instance-Specific Gradient Rescaling: S-FGRM

The Sampling-based Fast Gradient Rescaling Method (S-FGRM) introduces a replacement for the sign function that rescales each coordinate based on the local statistics of the gradient, computed per instance and per iteration:

rescale(g)=csign(g)σ(norm(log2g)),norm(x)=xmean(x)Std(x),σ(u)=11+eu\text{rescale}(g) = c \cdot \text{sign}(g)\,\odot\,\sigma(\mathrm{norm}(\log_2|g|)), \qquad \mathrm{norm}(x) = \frac{x-\text{mean}(x)}{\text{Std}(x)}, \quad \sigma(u) = \frac{1}{1+e^{-u}}

where cc is a predetermined maximum magnitude (e.g., c=2c=2). This transformation leverages log-compression to reduce the influence of extreme values, per-instance normalization to center and scale the gradient, and a sigmoid to confine the relative importance weights. The transformation is strictly instance-specific: each input’s gradient distribution yields different scaling, preserving the relative strength of coordinates and thus the geometric fidelity with respect to the loss surface (Han et al., 2023). Unlike sign-based updates, S-FGRM captures both direction and local magnitude structure.

3. Depth-First Sampling and Update Stabilization

Small or near-zero gradient components are particularly susceptible to numerical instability. S-FGRM incorporates Depth First Sampling (DFS) to regularize the rescaled update:

  • At each iteration, a path of NN local perturbations {ξi}\{\xi_i\} is sequentially sampled, where each subsequent point is offset from the prior, producing a “depth-first chain” rather than i.i.d. noise about the anchor.
  • Gradients at each location are averaged to obtain a smoothed estimate before applying the instance-specific rescaling.

Algorithmically, for the tt-th iteration:

xt0=xtadv xti+1=xti+ξi,ξiUniform(βϵ,+βϵ) g^t+1=1N+1i=0NxJ(xti,y) gt+1=μgt+g^t+1g^t+11 xt+1adv=xtadv+αrescale(gt+1)\begin{align*} x_t^0 & = x_t^{\text{adv}} \ x_t^{i+1} & = x_t^i + \xi_i,\quad \xi_i \sim \text{Uniform}(-\beta \epsilon, +\beta \epsilon) \ \hat{g}_{t+1} & = \frac{1}{N+1} \sum_{i=0}^{N} \nabla_x J(x_t^i, y) \ g_{t+1} & = \mu g_t + \frac{\hat{g}_{t+1}}{\|\hat{g}_{t+1}\|_1} \ x_{t+1}^{\text{adv}} & = x_t^{\text{adv}} + \alpha\,\text{rescale}(g_{t+1}) \end{align*}

DFS mitigates fluctuation in the rescaling weights and enhances update stability, a crucial aspect when gradient entries span several orders of magnitude (Han et al., 2023).

4. Integration with Advanced Attack Frameworks

Instance-specific rescaling is designed as a modular drop-in for any gradient-based adversarial attack that relies on the sign of the gradient, such as FGSM, I-FGSM, MI-FGSM, and NI-FGSM. S-FGRM extends to input-transform attacks (DIM, TIM, SIM), as well as composite transformations (CTM) and ensemble-method scenarios. In all these cases, the gradient is aggregated (across transformations/models), rescaled per the S-FGRM rule, and used to update the adversarial example. The compatibility is agnostic to the choice of input transformations or gradient smoothing modalities (Han et al., 2023).

Additionally, S-FGRM’s per-instance adaptive scaling can be synergistically combined with model-ensemble methods by averaging over surrogate network gradients before applying rescaling, further enhancing transferability even against adversarially trained or certified models.

5. Clipping-Aware Instance-Specific Norm Rescaling

A complementary instance-specific rescaling technique is the analytic, differentiable, clipping-aware normalization and rescaling in the context of domain-restricted perturbations (Rauber et al., 2020). Given a base xDRnx \in D \subset \mathbb{R}^n and perturbation direction δ\delta, the optimal scaling η\eta is analytically determined so that

v=clip[a,b](x+ηδ),vxp=ϵv = \text{clip}_{[a, b]}(x + \eta \delta), \qquad \|v - x\|_p = \epsilon

The procedure computes, for each coordinate, the maximal allowed η\eta before exceeding the clipping limit, partitions the terms by saturation, and solves for η\eta via sorting and cumulative sums. This method yields an exact instance- and coordinate-dependent scaling factor, providing exact norm-constrained projection post-clipping with full differentiability. It is computationally efficient (O(nlogn)\mathcal{O}(n \log n)), and widely used for robust adversarial example generation in high-dimensional domains (Rauber et al., 2020).

6. Empirical Performance and Transferability Gains

Extensive ImageNet experiments demonstrate the impact of instance-specific gradient rescaling on black-box adversarial success rates. When adversarial examples are crafted using S-FGRM, transferability to holdout architectures climbs substantially—up to 30–50 percentage points over strong sign-based baselines. For example, attacking Inception-v3 and evaluating on Inception-v4, MI-FGSM yields a 44.3% success rate versus 82.0% with SMI-FGRM. Performance gains are consistently observed across single-model, composite input transformation, and ensemble attack regimes, as well as against adversarially trained and certified models (Han et al., 2023).

Attack Variant Baseline Success (%) S-FGRM Success (%)
MI-FGSM → Inc-v4 44.3 82.0
MI-CTM-FGSM → Res-101 78.1 88.9
Ensemble MI-FGSM → IncRes-v2_ens 27.9 76.6

The critical factor underlying these gains is the preservation of input-specific gradient geometry, which aligns perturbations more closely with the true structure of the loss surface, thereby increasing transfer potential for black-box targets.

7. Broader Applications and Implementation Considerations

Instance-specific gradient rescaling generalizes beyond adversarial attacks. It is relevant for robust optimization, data augmentation, and any domain where reliable norm-constrained updates require alignment with the true sensitivity landscape of a loss. Important practical considerations for implementation include efficient per-sample normalization, numerically stable logarithmic and sigmoid transforms, and batched calculation of statistics to avoid computational overhead. The analytic rescaling methodology is compatible with autodifferentiation frameworks and is supported by native code in major ML libraries (Rauber et al., 2020).

The methodology’s extensibility and empirical gains highlight its importance in the ongoing advancement of robust machine learning and adversarial research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Instance-Specific Gradient Rescaling.