Papers
Topics
Authors
Recent
2000 character limit reached

Perturbation-guided Path Module (PGM)

Updated 16 January 2026
  • Perturbation-guided Path Module (PGM) is a mechanism that converts multi-scale feature tensors into trajectory-aware sequences to trace subtle signal perturbations in infrared detection.
  • It employs energy field construction and gradient-following walks to expose anisotropic diffusion patterns, overcoming limitations of conventional CNNs and ViTs.
  • Empirical results demonstrate its efficacy, boosting IoU from 63.12% to 81.04% and significantly reducing false alarm rates in infrared small target detection.

The Perturbation-guided Path Module (PGM) is a trajectory extraction mechanism used in TAPM-Net, a specialized architecture for infrared small target detection (ISTD) that models the directional propagation of small target-induced disturbances in deep feature spaces. PGM transforms multi-scale feature tensors into trajectory-aware sequences by tracing energy fields that reflect localized feature discontinuities, enabling physically motivated state-space propagation and facilitating highly accurate detection in cluttered environments (Xie et al., 9 Jan 2026).

1. Rationale and Problem Formulation

PGM addresses the challenge posed by ISTD tasks where targets are characterized by weak, spatially localized perturbations that are often obscured by structured noise and complex backgrounds. Conventional CNNs and ViTs highlight high-contrast regions and textures but lack mechanisms to trace or exploit the anisotropic diffusion patterns caused by true signal perturbations. PGM is designed to expose the spatial diffusion behavior of these signal perturbations by constructing explicit feature trajectories that encode directional layer-wise responses, thus serving as discriminative cues for downstream processing. This approach is particularly salient in ISTD, where distinguishing subtle signal ripples from pervasive noise is essential.

2. Core Methodology

PGM operates independently at each encoder stage ll within the TAPM-Net backbone, typically implemented as a four-stage U-Net hierarchy. The input at stage ll is a feature tensor F(l)RCl×Hl×Wl\mathbf{F}^{(l)} \in \mathbb{R}^{C_l \times H_l \times W_l}, where ClC_l is the channel count and Hl,WlH_l, W_l are the spatial dimensions. The processing pipeline consists of the following steps:

  1. Energy Field Construction: Collapses F(l)\mathbf{F}^{(l)} into a scalar energy map E(l)RHl×Wl\mathcal{E}^{(l)} \in \mathbb{R}^{H_l \times W_l} by summing the absolute first-order channel-wise finite differences, emphasizing discontinuities indicative of perturbations:

E(l)(x,y)=c=1ClFc(l)(x+1,y)Fc(l)(x1,y)+Fc(l)(x,y+1)Fc(l)(x,y1)\mathcal{E}^{(l)}(x,y) = \sum_{c=1}^{C_l} |F^{(l)}_c(x+1,y) - F^{(l)}_c(x-1,y)| + |F^{(l)}_c(x,y+1) - F^{(l)}_c(x,y-1)|

  1. Seed Point Identification: Detects the local maxima in E(l)\mathcal{E}^{(l)} using non-maximum suppression and thresholding, yielding the initial positions for trajectory extraction.
  2. Gradient-following Walks: From each seed, performs a gradient-following walk of length LL:

pj+1=pj+ηE(l)(pj)E(l)(pj)2+ϵp_{j+1} = p_j + \eta\,\frac{\nabla \mathcal{E}^{(l)}(p_j)}{||\nabla \mathcal{E}^{(l)}(p_j)||_2 + \epsilon}

where E(l)\nabla \mathcal{E}^{(l)} is the spatial gradient (via Sobel/filter), η\eta is the step size, and ϵ\epsilon maintains numerical stability. Bilinear interpolation extracts the corresponding feature vector fjf_j at each path location.

  1. Trajectory Assembly: The set of interpolated vectors along each trajectory yields a feature sequence P(l)=[f1,,fL]RL×Cl\mathcal{P}^{(l)} = [f_1,\ldots,f_L] \in \mathbb{R}^{L \times C_l} for each seed. Multiple trajectories are extracted per stage.

3. Integration within TAPM-Net Architecture

Each trajectory sequence P(l)\mathcal{P}^{(l)} serves as input to the Trajectory-Aware State Block (TASB), which implements a Mamba-based state-space recurrence on the paths. Within TASB, parameterized updates via matrices A,B,C,D\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{D} model dynamic propagation and velocity-constrained diffusion. The processed trajectory states yjy_j are concatenated with the original sampled features, word-level embeddings Fw(xj,yj)\mathbf{F}_w(x_j,y_j), and sentence-level embeddings Fs(xj,yj)\mathbf{F}_s(x_j,y_j), then linearly projected and re-scattered onto the spatial feature map with collision-handling via averaging. A residual fusion mechanism:

Ffinal(l)=F(l)+λF^(l)\mathbf{F}_{\text{final}}^{(l)} = \mathbf{F}^{(l)} + \lambda\,\hat{\mathbf{F}}^{(l)}

where λ\lambda is a fusion hyperparameter, merges trajectory-enhanced features back into the backbone, ensuring global coherence and skip-connection compatibility.

PGM further computes an auxiliary perturbation response map G^RH×W\hat{\mathbf{G}}\in\mathbb{R}^{H\times W} by re-projecting trajectory tokens and normalizing by pixel visit frequency. This map is supervised by the ground-truth mask via binary cross-entropy, enforcing spatial alignment of extracted trajectories with true object regions.

4. Hyperparameters and Computational Characteristics

PGM incorporates several tunable hyperparameters:

  • Stage count: L=4L=4
  • Typical trajectory length: $10$–$20$ steps
  • Step size: η[0.5,2.0]\eta \in [0.5,2.0]
  • Stability constant: ϵ=106\epsilon=10^{-6}
  • Seed threshold: 0.1×maxE(l)0.1\times\max \mathcal{E}^{(l)}
  • Fusion weight: λ[0.1,1.0]\lambda \in [0.1,1.0]
  • Feature map resolution: (Hl,Wl)(H_l,W_l) halves per stage

Complexity per stage is O(ClHlWl)\mathcal{O}(C_l H_l W_l) (energy map) and O(NsLCl)\mathcal{O}(N_s L C_l) (trajectory walks, NsN_s is seed count). Because NsN_s remains low (typically \sim10–40) and LL and ClC_l are moderate, overall overhead is substantially less than for full self-attention modules.

5. Empirical Evaluation

PGM demonstrates significant performance improvements on public ISTD benchmarks. On NUAA-SIRST, adding PGM to a U-Net backbone boosts Intersection-over-Union (IoU) from 63.12%63.12\% to 72.45%72.45\%, reduces false alarm rate from 43.23%43.23\% to 8.78%8.78\%, and improves detection probability PdP_d from 91.23%91.23\% to 95.63%95.63\%. Ablation studies illustrate component-wise contributions:

Variant IoU (%) False Alarm Rate (%) PdP_d (%)
U-Net (baseline) 63.12 43.23 91.23
Energy Map Only 50.44
Energy Map + Trajectory 74.78
Full PGM + Fusion 81.04

The results confirm that performance gains derive from the combined use of energy-based estimation, gradient-guided trajectory extraction, and spatially fused back-projection, rather than from energy mapping or feature fusion independently.

6. Significance and Implications

The PGM offers a physically motivated, data-driven strategy for detecting and modeling the subtle spatial ripples induced by small targets in high-dimensional feature spaces. By converting raw perturbations into explicit trajectories, it enables subsequent state-space modeling (via TASB) to exploit both local diffusion behaviors and global semantic coherence. A plausible implication is that PGM’s methodology may generalize to other vision tasks involving weak or sparse signal propagation, particularly where standard backbones fail to encode directional perturbation propagation. The demonstrated balance of accuracy and computational efficiency on ISTD benchmarks highlights PGM as a distinctive advancement in trajectory-centric deep feature analysis (Xie et al., 9 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Perturbation-guided Path Module (PGM).