Papers
Topics
Authors
Recent
2000 character limit reached

CASR-PAN: Adaptive Sparse Routing Network

Updated 5 January 2026
  • The paper introduces CASR-PAN, which leverages content-adaptive sparse routing to efficiently fuse multi-scale features for improved infrared gas leak detection.
  • It utilizes an Importance Estimator and dynamic routing weights through spatial gating and fusion blocks to selectively transmit salient information.
  • Empirical findings show CASR-PAN outperforms dense fusion methods, delivering higher AP scores and reducing GFLOPs and parameters in challenging detection tasks.

The Content-Adaptive Sparse Routing Path Aggregation Network (CASR-PAN) is a neural feature aggregation architecture designed for cross-scale feature fusion in detection networks, particularly within the context of the PEG-DRNet framework for infrared gas leak detection. CASR-PAN implements sparse, content-adaptive routing of multi-scale features, selectively propagating information across spatial locations and resolution levels based on dynamic, data-driven cues. This architecture addresses redundancies and inefficiencies in traditional dense path aggregation necks by leveraging spatial gating and explicit modulation mechanisms, yielding improvements in both accuracy and computational efficiency in challenging object detection scenarios (Li et al., 29 Dec 2025).

1. Architectural Structure and Component Design

CASR-PAN is situated between the feature extraction backbone (Physics–Edge Hybrid) and the detection decoder (RT-DETR). Its role is to fuse multi-scale feature maps, specifically {P3,P4,P5}\{P3, P4, P5\}, using path-specific, content-adaptive control. The core components are:

  • Importance Estimator (IE): Takes a feature map XRB×C×H×WX\in\mathbb{R}^{B\times C\times H\times W} and generates an importance tensor II via fusion of three cues:
    • Global (GG): Global-average pooling followed by 1×11\times 1 convolutions and nonlinearity.
    • Local (LL): Local context via 3×33\times 3 convolution and further transformation.
    • Diversity (DD): 1×11\times 1 convolution, ReLU, sigmoid, and per-channel variance.
    • Softmax-normalized scalar weights (wg,wl,wd)(w_g, w_l, w_d) combine these into I=σ(wgG+wlL+wdD)I = \sigma(w_g \cdot G + w_l \cdot L + w_d \cdot D).
  • Routing Weight Generation: A 1×11\times 1 convolution transforms II into four spatially-varying routing maps {W1,W2,W3,W4}[0,1]B×1×H×W\{W_1, W_2, W_3, W_4\} \in [0,1]^{B\times 1\times H\times W}, assigned to four functional paths.
  • Adaptive Information Modulation for Fusion (AIMM-F): Blends local (FlocalF_\text{local}) and transported (FtransportF_\text{transport}) features:

Y=Flocal(Ftransport[W(BA+σ(std(Ftransport)))])Y = F_\text{local} \oplus \Big(F_\text{transport} \odot [W \cdot (BA + \sigma(\text{std}(F_\text{transport})))]\Big)

where \oplus denotes addition, \odot is broadcasted multiplication, BA=0.5BA=0.5 is a bias, and std(Ftransport)\text{std}(F_\text{transport}) is per-channel standard deviation.

  • Adaptive Information Modulation for Self (AIMM-S): Enhances a feature map via self-routing:

Y=F[IDAS+W(BA+σ(std(F)))]Y = F \odot [IDAS + W \cdot (BA + \sigma(\text{std}(F)))]

IDAS=1IDAS=1 provides an identity baseline.

  • Explicit Routing Paths: Four defined flows:
    1. Deep (P5P5) to mid (P4P4) via AIMM-F and W1W_1.
    2. Deep (P5P5) to shallow (P3P3) via AIMM-F and W2W_2.
    3. Shallow (P3P3) to mid (P4P4) via AIMM-F and W3W_3.
    4. Mid (P4P4) self-enhancement via AIMM-S and W4W_4.

The resulting routed outputs are aggregated and refined by shallow convolutional blocks before emission to the decoder.

2. Routing Formulation and Sparsity Mechanisms

The content-adaptive routing in CASR-PAN is implemented as a spatially varying convex blend at each location (x,y)(x,y):

Fout(x,y)=(1W(x,y))Fl(x,y)+W(x,y)Fh(x,y)F_\text{out}(x, y) = (1 - W(x, y)) F_l(x, y) + W(x, y) F_h(x, y)

where FlF_l and FhF_h are local and transported feature vectors, respectively, and W(x,y)[0,1]W(x, y) \in [0,1] is the routing weight. WW is generated by applying a sigmoid nonlinearity to the output of a 1×11\times 1 convolution over the fused importance II.

Sparsity arises implicitly: The sigmoid activation, together with the bias BA=0.5BA=0.5, encourages low routing weights for uninformative regions. No explicit L1L_1 regularization or hard thresholding is used; paths are “sparse” in the practical sense that many W(x,y)W(x, y) approach zero, effectively suppressing cross-scale propagation at these points.

A plausible implication is that this spatial sparsity focuses computation and transmission on salient structures, such as gas plume boundaries, rather than uniformly across the field of view.

3. Network Dataflow and Functional Workflow

The canonical forward pass through CASR-PAN can be summarized as follows:

  1. Importance Estimation: For each input scale Ps (s{3,4,5})P_s \ (s\in\{3,4,5\}), compute global (GG), local (LL), and diversity (DD) cues, combine with learned weights to form IsI_s.
  2. Routing Map Generation: Typically, P5P5's importance I5I_5 is used to generate all four routing weights through a 1×11\times 1 conv and sigmoid.
  3. Routing and Fusion:
    • P5P4P5 \rightarrow P4: AIMM-F with W1W_1
    • P5P3P5 \rightarrow P3: AIMM-F with W2W_2
    • P3P4P3 \rightarrow P4: AIMM-F with W3W_3
    • P4P4 self: AIMM-S with W4W_4
  4. Aggregation and Output: Routed outputs targeting the same scale are summed, passed through a light RepC3 block, and forwarded along with unmodified P5P5 to the detection decoder.

This workflow fuses features in a way that is both content- and location-adaptive, differing from traditional fixed or shared-weight aggregation strategies. Pseudocode for the full forward pass is supplied verbatim in the original source (Li et al., 29 Dec 2025).

4. Computational Complexity and Efficiency Gains

CASR-PAN reduces the redundancy and parameter overhead typical of dense path aggregation networks such as BiFPN. Empirical analysis on the IIG dataset (input size 640×640640\times640) yields the following:

Variant AP AP50_{50} GFLOPs Parameters
RT-DETR-R18 26.8 77.8 56.9 19.87 M
PEG-DRNet (CASR) 29.8 84.3 43.7 14.93 M

CASR-PAN adds approximately $45.8$ GFLOPs and $16.94$ M parameters, which is substantially less than BiFPN (which requires $64.3$ GFLOPs and $20.3$ M parameters). Thus, CASR-PAN saves about $13$ GFLOPs and $5$ M parameters while increasing AP50_{50} by roughly $6.5$ points over a dense neck. This reflects both reduced computational cost and redundant fusion avoidance (Li et al., 29 Dec 2025).

5. Ablation Studies and Empirical Findings

CASR-PAN yields substantial improvements in detection metrics and cross-scale discriminability:

  • The introduction of full CASR-PAN on a ResNet18 backbone increases AP from 26.8%26.8\% to 29.4%29.4\% and AP50_{50} from $77.8$ to $79.1$.
  • In combination with the Physics–Edge Hybrid Backbone (PEG-DRNet), CASR-PAN achieves AP=29.8%=29.8\%, AP50=84.3_{50}=84.3, and small-object APS=25.3%_S=25.3\%, all at $43.7$ GFLOPs and $14.9$ M parameters.
  • Path ablations: Removing any of the defined routing flows results in a $1$–$2$ point reduction in AP.
  • When tested against PANet, BiFPN, and NAS-FPN under identical backbones, CASR-PAN achieves +2.1+2.1–$4.4$ point AP gains with $20$–50%50\% fewer FLOPs.
  • Small-object APS_S improves by $3$–$5$ points, and AP75_{75} rises from about 8%8\% (dense fusion) to 12.3%12.3\% with CASR-PAN, indicative of improved fine localization (Li et al., 29 Dec 2025).

These findings highlight the benefit of content-selective, sparse multi-scale feature propagation in detection tasks with diffuse, weakly-bounded targets.

CASR-PAN is positioned within a landscape of cross-scale fusion methods, such as BiFPN and PANet. Unlike these dense architectures, CASR-PAN individually weighs and gates the propagation of multi-scale features, based on spatial content and edge cues. This suggests that content-adaptive sparse routing is advantageous in domains where informative features are spatially concentrated or weakly localized—such as in infrared gas-leak detection, where targets are often small and have indistinct boundaries.

The use of learned importance cues and route-specific gating advances over static or uniformly-applied fusion, allowing the system to prioritize salient features and minimize unnecessary computation. A plausible implication is that such adaptive strategies may generalize to other visual reasoning tasks requiring high sensitivity to fine local structures while maintaining efficiency.

7. Limitations and Prospective Extensions

The concept of “sparse routing” in CASR-PAN refers to implicit suppression of uninformative regions via sigmoid-based gating and additive bias. No explicit L1L_1 regularization or hard sparsity constraints are applied in the published configuration, though such techniques could be explored for further efficiency. The architecture is tightly integrated with edge and physics-based modules in PEG-DRNet, and its performance improvements are demonstrated specifically for infrared gas-plume detection.

A plausible extension would be to investigate explicit regularization for increased sparsity, or to generalize the routing mechanism for broader multi-task feature fusion. The effectiveness of CASR-PAN is empirically established against established path aggregation mechanisms, suggesting its potential as a general-purpose, efficient cross-scale routing module in detection architectures (Li et al., 29 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Content-Adaptive Sparse Routing Path Aggregation Network (CASR-PAN).