Perturbation-guided Path Module (PGM)
- Perturbation-guided Path Module (PGM) is a mechanism that converts multi-scale feature tensors into trajectory-aware sequences to trace subtle signal perturbations in infrared detection.
- It employs energy field construction and gradient-following walks to expose anisotropic diffusion patterns, overcoming limitations of conventional CNNs and ViTs.
- Empirical results demonstrate its efficacy, boosting IoU from 63.12% to 81.04% and significantly reducing false alarm rates in infrared small target detection.
The Perturbation-guided Path Module (PGM) is a trajectory extraction mechanism used in TAPM-Net, a specialized architecture for infrared small target detection (ISTD) that models the directional propagation of small target-induced disturbances in deep feature spaces. PGM transforms multi-scale feature tensors into trajectory-aware sequences by tracing energy fields that reflect localized feature discontinuities, enabling physically motivated state-space propagation and facilitating highly accurate detection in cluttered environments (Xie et al., 9 Jan 2026).
1. Rationale and Problem Formulation
PGM addresses the challenge posed by ISTD tasks where targets are characterized by weak, spatially localized perturbations that are often obscured by structured noise and complex backgrounds. Conventional CNNs and ViTs highlight high-contrast regions and textures but lack mechanisms to trace or exploit the anisotropic diffusion patterns caused by true signal perturbations. PGM is designed to expose the spatial diffusion behavior of these signal perturbations by constructing explicit feature trajectories that encode directional layer-wise responses, thus serving as discriminative cues for downstream processing. This approach is particularly salient in ISTD, where distinguishing subtle signal ripples from pervasive noise is essential.
2. Core Methodology
PGM operates independently at each encoder stage within the TAPM-Net backbone, typically implemented as a four-stage U-Net hierarchy. The input at stage is a feature tensor , where is the channel count and are the spatial dimensions. The processing pipeline consists of the following steps:
- Energy Field Construction: Collapses into a scalar energy map by summing the absolute first-order channel-wise finite differences, emphasizing discontinuities indicative of perturbations:
- Seed Point Identification: Detects the local maxima in using non-maximum suppression and thresholding, yielding the initial positions for trajectory extraction.
- Gradient-following Walks: From each seed, performs a gradient-following walk of length :
where is the spatial gradient (via Sobel/filter), is the step size, and maintains numerical stability. Bilinear interpolation extracts the corresponding feature vector at each path location.
- Trajectory Assembly: The set of interpolated vectors along each trajectory yields a feature sequence for each seed. Multiple trajectories are extracted per stage.
3. Integration within TAPM-Net Architecture
Each trajectory sequence serves as input to the Trajectory-Aware State Block (TASB), which implements a Mamba-based state-space recurrence on the paths. Within TASB, parameterized updates via matrices model dynamic propagation and velocity-constrained diffusion. The processed trajectory states are concatenated with the original sampled features, word-level embeddings , and sentence-level embeddings , then linearly projected and re-scattered onto the spatial feature map with collision-handling via averaging. A residual fusion mechanism:
where is a fusion hyperparameter, merges trajectory-enhanced features back into the backbone, ensuring global coherence and skip-connection compatibility.
PGM further computes an auxiliary perturbation response map by re-projecting trajectory tokens and normalizing by pixel visit frequency. This map is supervised by the ground-truth mask via binary cross-entropy, enforcing spatial alignment of extracted trajectories with true object regions.
4. Hyperparameters and Computational Characteristics
PGM incorporates several tunable hyperparameters:
- Stage count:
- Typical trajectory length: $10$–$20$ steps
- Step size:
- Stability constant:
- Seed threshold:
- Fusion weight:
- Feature map resolution: halves per stage
Complexity per stage is (energy map) and (trajectory walks, is seed count). Because remains low (typically 10–40) and and are moderate, overall overhead is substantially less than for full self-attention modules.
5. Empirical Evaluation
PGM demonstrates significant performance improvements on public ISTD benchmarks. On NUAA-SIRST, adding PGM to a U-Net backbone boosts Intersection-over-Union (IoU) from to , reduces false alarm rate from to , and improves detection probability from to . Ablation studies illustrate component-wise contributions:
| Variant | IoU (%) | False Alarm Rate (%) | (%) |
|---|---|---|---|
| U-Net (baseline) | 63.12 | 43.23 | 91.23 |
| Energy Map Only | 50.44 | — | — |
| Energy Map + Trajectory | 74.78 | — | — |
| Full PGM + Fusion | 81.04 | — | — |
The results confirm that performance gains derive from the combined use of energy-based estimation, gradient-guided trajectory extraction, and spatially fused back-projection, rather than from energy mapping or feature fusion independently.
6. Significance and Implications
The PGM offers a physically motivated, data-driven strategy for detecting and modeling the subtle spatial ripples induced by small targets in high-dimensional feature spaces. By converting raw perturbations into explicit trajectories, it enables subsequent state-space modeling (via TASB) to exploit both local diffusion behaviors and global semantic coherence. A plausible implication is that PGM’s methodology may generalize to other vision tasks involving weak or sparse signal propagation, particularly where standard backbones fail to encode directional perturbation propagation. The demonstrated balance of accuracy and computational efficiency on ISTD benchmarks highlights PGM as a distinctive advancement in trajectory-centric deep feature analysis (Xie et al., 9 Jan 2026).