Gradient Supplementary Module (GSM)
- GSM is a neural module that integrates raw gradient data using convolution and SE attention blocks to enhance edge positioning in infrared images.
- It fuses gradient and main branch features through a residual pathway, improving spatial detail and small target discrimination.
- Ablation studies show that the full GSM configuration achieves an IoU of 0.8142, validating its effectiveness in enhancing edge fidelity.
The Gradient Supplementary Module (GSM) is a neural architecture component introduced in the "Gradient-Guided Learning Network for Infrared Small Target Detection" for the purpose of encoding raw gradient information into deep network layers. Designed specifically to alleviate the problem of inaccurate edge positioning and improve small target discrimination in infrared imagery, GSM systematically fuses gradient magnitude information with learned feature maps, thus enhancing spatial detail representation and feature extraction capacity (Zhao et al., 10 Dec 2025).
1. Raw Gradient Magnitude Computation
GSM operates on the gradient magnitude image derived from the input intensity map. Although the precise operator is not specified, the referenced computation in the paper aligns with standard image processing practices. The 2D gradient magnitude at pixel is computed as
A typical implementation employs 3×3 Sobel kernels for and , with gradient magnitude assembled by channel-wise . This approach ensures the extraction of edge details necessary for robust infrared target delineation.
2. Structural Composition of GSM
GSM is composed of two primary blocks at each Stage of the network:
- G_Block: This consists of a single 3×3 convolution (padding=1, stride=1) applied to a spatially pooled gradient magnitude map, immediately followed by a squeeze-and-excitation (SE) attention block.
- Res (Residual Fusion): The Res block performs feature fusion, combining the main branch feature map and the SE-weighted output of G_Block, , via element-wise summation:
where 0 is the SE transformation of the gradient feature.
The SE block follows the established formula: 1
2
with 3 reducing the channel width by a ratio 4 and 5 restoring the channel dimension. Channel and architectural hyperparameters are not enumerated in the referenced work.
3. Attention Mechanisms
Channel attention within GSM utilizes a squeeze-and-excitation (SE) block to recalibrate the feature responses adaptively. The channel-wise weights 6 are derived by globally averaging each channel, then passing through a two-layer fully connected bottleneck (with reduction ratio 7) and sigmoid activation. The reweighted features are produced as
8
No spatial attention or gating is introduced in GSM beyond this channel-wise mechanism.
4. Fusion of Gradient and Main Branch Features
At each Stage, GSM fuses gradient-derived and main branch features via a residual pathway:
- The raw gradient map is spatially downsampled (max-pooling) to the resolution of the current Stage.
- Convolution with a 3×3 kernel produces intermediate feature 9.
- SE attention reweights 0.
- The main branch feature 1 and SE-weighted gradient feature 2 are summed:
3
This mechanism directly incorporates gradient information, biasing the feature maps toward enhanced edge preservation.
5. Integration Within Dual-Branch Network Architecture
GSM is deployed at the terminus of every Stage (4) in the main branch of the dual-branch network. In parallel, the supplementary branch sequentially pools the original gradient magnitude image, matching its resolution to each Stage before feeding it to the corresponding GSM instance. This dual path ensures multi-scale extraction and integration of gradient information throughout the network depth.
6. Ablation Study and Empirical Results
Controlled ablation experiments outlined in Table II of the reference demonstrate the efficacy of GSM. When the residual fusion is replaced with simple addition (M_G_Add), the IoU drops by 1.28%. Shifting the G_Block to the supplementary branch (M-G-M_Res) diminishes IoU by 0.34%. The network with the full GSM achieves the highest metrics (IoU: 0.8142, nIoU: 0.7858), confirming its contribution to edge fidelity and small target discrimination in infrared images.
| Variant | IoU | nIoU | Relative Performance |
|---|---|---|---|
| M_G_Add | 0.8038 | 0.7797 | −1.28%, −0.78% |
| M-G-M_Res | 0.8108 | 0.7817 | −0.34%, −0.41% |
| Full GSM | 0.8142 | 0.7858 | Baseline |
7. Implementation Details and Hyperparameters
Key training parameters are:
- Learning rate: 5
- Batch size: 4
- Number of epochs: 500
- Optimizer: Not explicitly stated (likely Adam)
- Weight initialization: Not detailed (default PyTorch assumed)
- Regularization: Not specified beyond typical weight decay
With code availability provided at the referenced GitHub repository, all essential implementation details for GSM are sourced from the original work (Zhao et al., 10 Dec 2025). No additional architectural, optimization, or regulatory specifics are enumerated.