CASR-PAN: Adaptive Sparse Routing Network

Updated 5 January 2026

The paper introduces CASR-PAN, which leverages content-adaptive sparse routing to efficiently fuse multi-scale features for improved infrared gas leak detection.
It utilizes an Importance Estimator and dynamic routing weights through spatial gating and fusion blocks to selectively transmit salient information.
Empirical findings show CASR-PAN outperforms dense fusion methods, delivering higher AP scores and reducing GFLOPs and parameters in challenging detection tasks.

The Content-Adaptive Sparse Routing Path Aggregation Network (CASR-PAN) is a neural feature aggregation architecture designed for cross-scale feature fusion in detection networks, particularly within the context of the PEG-DRNet framework for infrared gas leak detection. CASR-PAN implements sparse, content-adaptive routing of multi-scale features, selectively propagating information across spatial locations and resolution levels based on dynamic, data-driven cues. This architecture addresses redundancies and inefficiencies in traditional dense path aggregation necks by leveraging spatial gating and explicit modulation mechanisms, yielding improvements in both accuracy and computational efficiency in challenging object detection scenarios (Li et al., 29 Dec 2025).

1. Architectural Structure and Component Design

CASR-PAN is situated between the feature extraction backbone (Physics–Edge Hybrid) and the detection decoder (RT-DETR). Its role is to fuse multi-scale feature maps, specifically $\{P3, P4, P5\}$ , using path-specific, content-adaptive control. The core components are:

Importance Estimator (IE): Takes a feature map $X\in\mathbb{R}^{B\times C\times H\times W}$ $X \in R^{B \times C \times H \times W}$ and generates an importance tensor $I$ $I$ via fusion of three cues:
- Global ( $G$ ): Global-average pooling followed by $1\times 1$ convolutions and nonlinearity.
- Local ( $L$ ): Local context via $3\times 3$ convolution and further transformation.
- Diversity ( $D$ ): $1\times 1$ convolution, ReLU, sigmoid, and per-channel variance.
- Softmax-normalized scalar weights $(w_g, w_l, w_d)$ combine these into $I = \sigma(w_g \cdot G + w_l \cdot L + w_d \cdot D)$ .
Routing Weight Generation: A $1\times 1$ convolution transforms $I$ into four spatially-varying routing maps $\{W_1, W_2, W_3, W_4\} \in [0,1]^{B\times 1\times H\times W}$ , assigned to four functional paths.
Adaptive Information Modulation for Fusion (AIMM-F): Blends local ( $F_\text{local}$ ) and transported ( $F_\text{transport}$ ) features:

$Y = F_\text{local} \oplus \Big(F_\text{transport} \odot [W \cdot (BA + \sigma(\text{std}(F_\text{transport})))]\Big)$

where $\oplus$ denotes addition, $\odot$ is broadcasted multiplication, $BA=0.5$ is a bias, and $\text{std}(F_\text{transport})$ is per-channel standard deviation.

Adaptive Information Modulation for Self (AIMM-S): Enhances a feature map via self-routing:

$Y = F \odot [IDAS + W \cdot (BA + \sigma(\text{std}(F)))]$

$IDAS=1$ provides an identity baseline.

Explicit Routing Paths: Four defined flows:
1. Deep ( $P5$ ) to mid ( $P4$ ) via AIMM-F and $W_1$ .
2. Deep ( $P5$ ) to shallow ( $P3$ ) via AIMM-F and $W_2$ .
3. Shallow ( $P3$ ) to mid ( $P4$ ) via AIMM-F and $W_3$ .
4. Mid ( $P4$ ) self-enhancement via AIMM-S and $W_4$ .

The resulting routed outputs are aggregated and refined by shallow convolutional blocks before emission to the decoder.

2. Routing Formulation and Sparsity Mechanisms

The content-adaptive routing in CASR-PAN is implemented as a spatially varying convex blend at each location $(x,y)$ :

$F_\text{out}(x, y) = (1 - W(x, y)) F_l(x, y) + W(x, y) F_h(x, y)$

where $F_l$ and $F_h$ are local and transported feature vectors, respectively, and $W(x, y) \in [0,1]$ is the routing weight. $W$ is generated by applying a sigmoid nonlinearity to the output of a $1\times 1$ convolution over the fused importance $I$ .

Sparsity arises implicitly: The sigmoid activation, together with the bias $BA=0.5$ , encourages low routing weights for uninformative regions. No explicit $L_1$ regularization or hard thresholding is used; paths are “sparse” in the practical sense that many $W(x, y)$ approach zero, effectively suppressing cross-scale propagation at these points.

A plausible implication is that this spatial sparsity focuses computation and transmission on salient structures, such as gas plume boundaries, rather than uniformly across the field of view.

3. Network Dataflow and Functional Workflow

The canonical forward pass through CASR-PAN can be summarized as follows:

Importance Estimation: For each input scale $P_s \ (s\in\{3,4,5\})$ , compute global ( $G$ ), local ( $L$ ), and diversity ( $D$ ) cues, combine with learned weights to form $I_s$ .
Routing Map Generation: Typically, $P5$ 's importance $I_5$ is used to generate all four routing weights through a $1\times 1$ conv and sigmoid.
Routing and Fusion:
- $P5 \rightarrow P4$ : AIMM-F with $W_1$
- $P5 \rightarrow P3$ : AIMM-F with $W_2$
- $P3 \rightarrow P4$ : AIMM-F with $W_3$
- $P4$ self: AIMM-S with $W_4$
Aggregation and Output: Routed outputs targeting the same scale are summed, passed through a light RepC3 block, and forwarded along with unmodified $P5$ to the detection decoder.

This workflow fuses features in a way that is both content- and location-adaptive, differing from traditional fixed or shared-weight aggregation strategies. Pseudocode for the full forward pass is supplied verbatim in the original source (Li et al., 29 Dec 2025).

4. Computational Complexity and Efficiency Gains

CASR-PAN reduces the redundancy and parameter overhead typical of dense path aggregation networks such as BiFPN. Empirical analysis on the IIG dataset (input size $640\times640$ ) yields the following:

Variant	AP	AP $_{50}$	GFLOPs	Parameters
RT-DETR-R18	26.8	77.8	56.9	19.87 M
PEG-DRNet (CASR)	29.8	84.3	43.7	14.93 M

CASR-PAN adds approximately $45.8$ GFLOPs and $16.94$ M parameters, which is substantially less than BiFPN (which requires $64.3$ GFLOPs and $20.3$ M parameters). Thus, CASR-PAN saves about $13$ GFLOPs and $5$ M parameters while increasing AP $_{50}$ by roughly $6.5$ points over a dense neck. This reflects both reduced computational cost and redundant fusion avoidance (Li et al., 29 Dec 2025).

5. Ablation Studies and Empirical Findings

CASR-PAN yields substantial improvements in detection metrics and cross-scale discriminability:

The introduction of full CASR-PAN on a ResNet18 backbone increases AP from $26.8\%$ to $29.4\%$ and AP $_{50}$ from $77.8$ to $79.1$.
In combination with the Physics–Edge Hybrid Backbone (PEG-DRNet), CASR-PAN achieves AP $=29.8\%$ , AP $_{50}=84.3$ , and small-object AP $_S=25.3\%$ , all at $43.7$ GFLOPs and $14.9$ M parameters.
Path ablations: Removing any of the defined routing flows results in a $1$–$2$ point reduction in AP.
When tested against PANet, BiFPN, and NAS-FPN under identical backbones, CASR-PAN achieves $+2.1$ –$4.4$ point AP gains with $20$– $50\%$ fewer FLOPs.
Small-object AP $_S$ improves by $3$–$5$ points, and AP $_{75}$ rises from about $8\%$ (dense fusion) to $12.3\%$ with CASR-PAN, indicative of improved fine localization (Li et al., 29 Dec 2025).

These findings highlight the benefit of content-selective, sparse multi-scale feature propagation in detection tasks with diffuse, weakly-bounded targets.

CASR-PAN is positioned within a landscape of cross-scale fusion methods, such as BiFPN and PANet. Unlike these dense architectures, CASR-PAN individually weighs and gates the propagation of multi-scale features, based on spatial content and edge cues. This suggests that content-adaptive sparse routing is advantageous in domains where informative features are spatially concentrated or weakly localized—such as in infrared gas-leak detection, where targets are often small and have indistinct boundaries.

The use of learned importance cues and route-specific gating advances over static or uniformly-applied fusion, allowing the system to prioritize salient features and minimize unnecessary computation. A plausible implication is that such adaptive strategies may generalize to other visual reasoning tasks requiring high sensitivity to fine local structures while maintaining efficiency.

7. Limitations and Prospective Extensions

The concept of “sparse routing” in CASR-PAN refers to implicit suppression of uninformative regions via sigmoid-based gating and additive bias. No explicit $L_1$ regularization or hard sparsity constraints are applied in the published configuration, though such techniques could be explored for further efficiency. The architecture is tightly integrated with edge and physics-based modules in PEG-DRNet, and its performance improvements are demonstrated specifically for infrared gas-plume detection.

A plausible extension would be to investigate explicit regularization for increased sparsity, or to generalize the routing mechanism for broader multi-task feature fusion. The effectiveness of CASR-PAN is empirically established against established path aggregation mechanisms, suggesting its potential as a general-purpose, efficient cross-scale routing module in detection architectures (Li et al., 29 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Physics-Inspired Modeling and Content Adaptive Routing in an Infrared Gas Leak Detection Network (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Content-Adaptive Sparse Routing Path Aggregation Network (CASR-PAN).