Papers
Topics
Authors
Recent
Search
2000 character limit reached

TAPM-Net for Infrared Target Detection

Updated 16 January 2026
  • The paper introduces TAPM-Net, a novel neural architecture that significantly improves infrared small target detection by modeling spatial diffusion of feature perturbations.
  • The Perturbation-guided Path Module (PGM) computes energy maps and traces gradient-guided trajectories to isolate subtle target signals from clutter.
  • The Trajectory-Aware State Block (TASB) employs state-space dynamics for context-sensitive fusion, achieving notable performance gains with improved IoU and reduced false alarms.

Trajectory-Aware Mamba Propagation Network (TAPM-Net) is a neural architecture specifically designed for infrared small target detection (ISTD), addressing longstanding challenges such as weak signal contrast, limited target spatial extent, and highly cluttered backgrounds. TAPM-Net differentiates itself from conventional CNN and vision transformer (ViT) approaches by providing explicit mechanisms to trace and model the spatial diffusion behavior of feature perturbations induced by small targets within the feature space, thereby enhancing the network’s ability to distinguish true signals from structured noise (Xie et al., 9 Jan 2026).

1. Conceptual Foundations and Motivation

ISTD tasks are hindered by the inability of standard backbone architectures to differentiate subtle target features from pervasive background clutter, especially when targets do not manifest as isolated “bright spots” but rather as sources of localized disturbance whose influences propagate spatially. Existing CNNs and ViTs can extract local contrast, but they are not equipped to model how target-induced feature perturbations are spatially diffused or to track their layer-wise evolution. TAPM-Net rectifies this deficit through the integration of two components:

  • Perturbation-guided Path Module (PGM): Models spatial propagation by constructing energy fields and tracing directional feature trajectories from multi-level feature representations.
  • Trajectory-Aware State Block (TASB): Implements a state-space propagation paradigm, leveraging Mamba-style dynamics for context-sensitive state transitions along the detected trajectories.

This design enables TAPM-Net to realize anisotropic and context-sensitive propagation in feature space, maintaining global spatial coherence with reduced computational burden compared to full self-attention mechanisms (Xie et al., 9 Jan 2026).

2. Perturbation-Guided Path Module (PGM)

PGM operates atop each backbone encoder stage, capturing the physical intuition that an infrared small target acts as a localized disturbance diffusing through its neighborhood rather than a static pixel artifact. For a feature map F(l)RCl×Hl×WlF^{(l)} \in \mathbb{R}^{C_l \times H_l \times W_l} at stage ll, PGM builds a perturbation energy map E(l)RHl×WlE^{(l)} \in \mathbb{R}^{H_l \times W_l} by aggregating absolute directional channel differences:

E(l)(x,y)=c=1ClFc(l)(x+1,y)Fc(l)(x1,y)+Fc(l)(x,y+1)Fc(l)(x,y1)E^{(l)}(x, y) = \sum_{c=1}^{C_l} |F_c^{(l)}(x+1, y) - F_c^{(l)}(x-1, y)| + |F_c^{(l)}(x, y+1) - F_c^{(l)}(x, y-1)|

Positions attaining high E(l)E^{(l)} values typically correspond to spatial discontinuities induced by true targets. Gradient-following walks are launched from the KK top local maxima (p1kp_1^k) of E(l)E^{(l)}, tracing paths via:

pj+1=pj+ηE(l)(pj)E(l)(pj)2+ϵp_{j+1} = p_j + \eta \frac{\nabla E^{(l)}(p_j)}{\|\nabla E^{(l)}(p_j)\|_2 + \epsilon}

where η\eta is a step-size hyperparameter and ϵ\epsilon prevents division by zero. Walks are repeated either for a fixed length LL or halted adaptively when the local energy decays below threshold τ\tau. Along each trajectory, feature vectors fjf_j are extracted using bilinear interpolation and stacked as token sequences P(l)=[f1,,fL]RL×ClP^{(l)} = [f_1, \dots, f_L] \in \mathbb{R}^{L \times C_l}, encoding how local perturbations propagate in feature space.

An auxiliary “perturbation response map” G^\hat{G} is constructed by mapping these trajectories back onto the H×WH \times W grid, which is used for supervision via an auxiliary loss promoting heightened network sensitivity to genuine targets (Xie et al., 9 Jan 2026).

3. Trajectory-Aware State Block (TASB)

TASB functions as a Mamba-style state-space model, processing the trajectory sequences obtained from PGM. Along each feature trajectory P(l)P^{(l)}, TASB propagates a hidden state dynamically, allowing for modeling of anisotropic and context-sensitive propagation. The state outputs are reprojected onto the grid, averaged at trajectory collisions, and fused back into the backbone features:

Ffinal(l)=F(l)+λH^(l)F_{\text{final}}^{(l)} = F^{(l)} + \lambda \hat{H}^{(l)}

where H^(l)\hat{H}^{(l)} is the TASB-enhanced map and λ\lambda is a learnable fusion weight. This fusion mechanism imparts trajectory-level context while preserving the original spatial layout, resulting in decoded representations with sharper and more coherent target segmentation masks.

4. Algorithmic Workflow and Complexity

PGM operates with linear computational overhead per stage, substantially lower than attention-based models. Specifically, for each stage ll,

  • Energy map computation: O(ClHlWl)O(C_l \cdot H_l \cdot W_l)
  • Local maxima detection: O(HlWl)O(H_l \cdot W_l)
  • Trajectory tracing for KK seeds, each length LL: O(KL)O(K \cdot L)
  • Bilinear interpolation along trajectories: O(KL)O(K \cdot L)

With standard backbones, where ClHlWlKLC_l \cdot H_l \cdot W_l \gg K \cdot L, this approach is substantially more computationally efficient than full spatial self-attention. All steps are differentiable and modular, facilitating straightforward implementation in PyTorch or TensorFlow (Xie et al., 9 Jan 2026).

Module Input/Output Structure Computational Order
PGM F(l)P(l)F^{(l)} \rightarrow P^{(l)}, G^\hat{G} O(ClHlWl)O(C_l H_l W_l), O(KL)O(KL)
TASB P(l)H^(l)P^{(l)} \rightarrow \hat{H}^{(l)} (state-propagation per trajectory)
Fusion F(l),H^(l)Ffinal(l)F^{(l)}, \hat{H}^{(l)} \rightarrow F_{\text{final}}^{(l)} Linear per stage

5. Hyperparameters and Tuning Roles

  • η\eta (step size): Controls the granularity of trajectory tracing; higher values yield coarser, longer jumps.
  • ϵ\epsilon (stability): Maintains numerical stability in gradient normalization (values 10610^{-6} to 10810^{-8} typical).
  • LL (trajectory length): Determines traced context range; typically 8–16 steps suffice.
  • KK (number of seeds): Sets the number of high-energy maxima followed per map (often 5–10).
  • Energy decay threshold (τ\tau): Sets adaptive stopping point for trajectory propagation (τ=0.2\tau = 0.2 commonly used).

These hyperparameters directly impact propagation fidelity, context scope, and computational footprint.

6. Experimental Evaluation

Evaluations on the NUAA-SIRST dataset demonstrate marked improvements from introducing TAPM-Net’s modules. When substituting a vanilla U-Net encoder with “U-Net+PGM,” mean Intersection over Union (IoU) increases from 63.12% to 72.45%, and false alarm rate drops from 43.23% to 8.78%. Further incorporating TASB (“U-Net+PGM+TASB”) results in 81.04% IoU and only 1.98% false alarms. Disaggregated ablations reveal:

  • Using only energy maps: IoU of 50.44%
  • Adding trajectory sampling: IoU rises to 74.78%
  • Integrating full trajectory fusion: recovers 81.04% IoU

This pattern validates that both gradient-guided path extraction and trajectory-to-grid fusion are necessary for optimal performance. These experimental results confirm that TAPM-Net’s distinctive modules yield significant gains in ISTD scenarios, establishing new baselines (Xie et al., 9 Jan 2026).

7. Broader Implications and Extensions

PGM imparts TAPM-Net with a differentiable, physically interpretable mechanism for harvesting multi-scale propagation cues, making it modular and extendable to other vision problems involving small-object and saliency detection. Its linear time complexity and modularity allow straightforward adaptation in modern frameworks. The approach suggests that disturbance propagation modeling, as instantiated here, is a valuable signal beyond ISTD, pointing toward applications in domains where small, subtle features must be robustly disentangled from structured noise.

A plausible implication is that trajectory-aware modeling could inspire advances in related fields such as anomaly detection, medical imaging, and remote sensing, wherever physically grounded spatial propagation mechanisms outperform conventional local or global context aggregation strategies (Xie et al., 9 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Trajectory-Aware Mamba Propagation Network (TAPM-Net).