Papers
Topics
Authors
Recent
Search
2000 character limit reached

Gradient Supplementary Module (GSM)

Updated 8 January 2026
  • GSM is a neural module that integrates raw gradient data using convolution and SE attention blocks to enhance edge positioning in infrared images.
  • It fuses gradient and main branch features through a residual pathway, improving spatial detail and small target discrimination.
  • Ablation studies show that the full GSM configuration achieves an IoU of 0.8142, validating its effectiveness in enhancing edge fidelity.

The Gradient Supplementary Module (GSM) is a neural architecture component introduced in the "Gradient-Guided Learning Network for Infrared Small Target Detection" for the purpose of encoding raw gradient information into deep network layers. Designed specifically to alleviate the problem of inaccurate edge positioning and improve small target discrimination in infrared imagery, GSM systematically fuses gradient magnitude information with learned feature maps, thus enhancing spatial detail representation and feature extraction capacity (Zhao et al., 10 Dec 2025).

1. Raw Gradient Magnitude Computation

GSM operates on the gradient magnitude image derived from the input intensity map. Although the precise operator is not specified, the referenced computation in the paper aligns with standard image processing practices. The 2D gradient magnitude at pixel (x,y)(x, y) is computed as

Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)

G(x,y)=Gx(x,y)2+Gy(x,y)2G(x, y) = \sqrt{G_{x}(x, y)^2 + G_{y}(x, y)^2}

A typical implementation employs 3×3 Sobel kernels for GxG_x and GyG_y, with gradient magnitude GG assembled by channel-wise Gx2+Gy2\sqrt{G_x^2 + G_y^2}. This approach ensures the extraction of edge details necessary for robust infrared target delineation.

2. Structural Composition of GSM

GSM is composed of two primary blocks at each Stage of the network:

  • G_Block: This consists of a single 3×3 convolution (padding=1, stride=1) applied to a spatially pooled gradient magnitude map, immediately followed by a squeeze-and-excitation (SE) attention block.
  • Res (Residual Fusion): The Res block performs feature fusion, combining the main branch feature map FmainF_\text{main} and the SE-weighted output of G_Block, FgradF_\text{grad}, via element-wise summation:

Fout=Fmain+Fgrad′F_\text{out} = F_\text{main} + F_\text{grad}'

where Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)0 is the SE transformation of the gradient feature.

The SE block follows the established formula: Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)1

Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)2

with Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)3 reducing the channel width by a ratio Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)4 and Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)5 restoring the channel dimension. Channel and architectural hyperparameters are not enumerated in the referenced work.

3. Attention Mechanisms

Channel attention within GSM utilizes a squeeze-and-excitation (SE) block to recalibrate the feature responses adaptively. The channel-wise weights Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)6 are derived by globally averaging each channel, then passing through a two-layer fully connected bottleneck (with reduction ratio Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)7) and sigmoid activation. The reweighted features are produced as

Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)8

No spatial attention or gating is introduced in GSM beyond this channel-wise mechanism.

4. Fusion of Gradient and Main Branch Features

At each Stage, GSM fuses gradient-derived and main branch features via a residual pathway:

  1. The raw gradient map is spatially downsampled (max-pooling) to the resolution of the current Stage.
  2. Convolution with a 3×3 kernel produces intermediate feature Gx(x,y)=I(x+1,y)−I(x−1,y),Gy(x,y)=I(x,y+1)−I(x,y−1)G_{x}(x, y) = I(x+1, y) - I(x-1, y), \quad G_{y}(x, y) = I(x, y+1) - I(x, y-1)9.
  3. SE attention reweights G(x,y)=Gx(x,y)2+Gy(x,y)2G(x, y) = \sqrt{G_{x}(x, y)^2 + G_{y}(x, y)^2}0.
  4. The main branch feature G(x,y)=Gx(x,y)2+Gy(x,y)2G(x, y) = \sqrt{G_{x}(x, y)^2 + G_{y}(x, y)^2}1 and SE-weighted gradient feature G(x,y)=Gx(x,y)2+Gy(x,y)2G(x, y) = \sqrt{G_{x}(x, y)^2 + G_{y}(x, y)^2}2 are summed:

G(x,y)=Gx(x,y)2+Gy(x,y)2G(x, y) = \sqrt{G_{x}(x, y)^2 + G_{y}(x, y)^2}3

This mechanism directly incorporates gradient information, biasing the feature maps toward enhanced edge preservation.

5. Integration Within Dual-Branch Network Architecture

GSM is deployed at the terminus of every Stage (G(x,y)=Gx(x,y)2+Gy(x,y)2G(x, y) = \sqrt{G_{x}(x, y)^2 + G_{y}(x, y)^2}4) in the main branch of the dual-branch network. In parallel, the supplementary branch sequentially pools the original gradient magnitude image, matching its resolution to each Stage before feeding it to the corresponding GSM instance. This dual path ensures multi-scale extraction and integration of gradient information throughout the network depth.

6. Ablation Study and Empirical Results

Controlled ablation experiments outlined in Table II of the reference demonstrate the efficacy of GSM. When the residual fusion is replaced with simple addition (M_G_Add), the IoU drops by 1.28%. Shifting the G_Block to the supplementary branch (M-G-M_Res) diminishes IoU by 0.34%. The network with the full GSM achieves the highest metrics (IoU: 0.8142, nIoU: 0.7858), confirming its contribution to edge fidelity and small target discrimination in infrared images.

Variant IoU nIoU Relative Performance
M_G_Add 0.8038 0.7797 −1.28%, −0.78%
M-G-M_Res 0.8108 0.7817 −0.34%, −0.41%
Full GSM 0.8142 0.7858 Baseline

7. Implementation Details and Hyperparameters

Key training parameters are:

  • Learning rate: G(x,y)=Gx(x,y)2+Gy(x,y)2G(x, y) = \sqrt{G_{x}(x, y)^2 + G_{y}(x, y)^2}5
  • Batch size: 4
  • Number of epochs: 500
  • Optimizer: Not explicitly stated (likely Adam)
  • Weight initialization: Not detailed (default PyTorch assumed)
  • Regularization: Not specified beyond typical weight decay

With code availability provided at the referenced GitHub repository, all essential implementation details for GSM are sourced from the original work (Zhao et al., 10 Dec 2025). No additional architectural, optimization, or regulatory specifics are enumerated.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Gradient Supplementary Module (GSM).