MRS-YOLO Algorithm for Rail Object Detection

Updated 15 October 2025

The paper introduces MRS-YOLO, which employs a novel multi-scale Adaptive Kernel Depth Feature Fusion module and attention-based recalibration to improve object detection in challenging railway scenarios.
It integrates a Re-calibration Feature Fusion Pyramid Network and a refined Spatial and Channel Reconstruction Detect Head to reduce false positives and enhance detection of small or occluded objects.
Model compression via Layer-Adaptive Magnitude-based Pruning reduces parameters by 44.2% and GFLOPs by 17.5%, enabling efficient real-time deployment without sacrificing accuracy.

The MRS-YOLO algorithm is a specialized object detection framework designed to address missed detections, false positives, and computational inefficiency in complex environments, specifically for railroad transmission line foreign object detection. Built on YOLO11, which itself advances the anchor-free YOLOv8 family, MRS-YOLO integrates novel modules for multi-scale feature fusion, advanced attention-based recalibration, refined detection heads, and model compression through channel pruning. These architectural enhancements result in substantial improvements in detection accuracy and efficiency, particularly under the demanding constraints of practical railway monitoring scenarios (Liu et al., 12 Oct 2025).

1. Architectural Modifications for Multi-Scale Feature Fusion

A central innovation in MRS-YOLO is the multi-scale Adaptive Kernel Depth Feature Fusion (MAKDF) module. MAKDF augments the backbone network by fusing with the existing C3k2 module, forming C3k2_MAKDF. This compound module leverages Adaptive Kernel Depth Convolutions (AKDC), which run parallel convolutions over the input feature grouped into three splits $(I_1, I_2, I_3)$ ; each group is processed with distinct kernel sizes ( $K=1$ , $K=3$ , $K=5$ ). These parallel branches correspond to a square kernel ( $K \times K$ ), horizontal banding ( $1 \times M$ ), and vertical banding ( $M \times 1$ , $M=3K+2$ ), enabling flexible feature extraction at varied spatial resolutions.

The outputs $(F_1, F_2, F_3)$ from each branch are concatenated and projected via a $1 \times 1$ convolution:

$O = C_{1 \times 1}( \text{Concat}(F_1, F_2, F_3) )$

Adaptive weighting coefficients, learned through pooling and softmax normalization, ensure each branch's contribution is tailored to the input's scale and aspect ratio. This enhancement enables the network to adeptly capture foreign objects of diverse sizes and shapes. The C3k2_MAKDF effectively replaces parts of the original backbone to yield improved multi-scale representation power.

2. Enhanced Neck Structure: Re-calibration Feature Fusion Pyramid Network

The Re-calibration Feature Fusion Pyramid Network (RCFPN) serves as the model's neck, advancing feature aggregation across spatial hierarchies. RCFPN introduces the Selective Boundary Aggregation (SBA) module, which incorporates a Re-calibration Attention Unit (RAU):

SBA achieves bidirectional calibration: Deep semantic priors are embedded in the shallow stream, while shallow boundary features reinforce the deep stream.
The RAU employs a dual-branch attention scheme, computing cross-resolution correlations via multi-head attention. Adaptive weights are assigned by gating functions (via sigmoids) on linearly mapped features.

Key operations:

$F' = S_1(F), \quad F_2' = S_2(F_2)$

$R(F, F_2) = F \odot F' + F_2 \odot F_2'$

Here, $S_1$ , $S_2$ are parameterized mappings, $\odot$ denotes element-wise multiplication. After RAU, features are upsampled, concatenated, and further refined with $3 \times 3$ convolution, enabling context-aware fusion and better delineation of object boundaries critical in cluttered railway environments.

3. Refined Detection Head: Spatial and Channel Reconstruction

For final prediction, the Spatial and Channel Reconstruction Detect Head (SC_Detect), based on ScConv, refines features both spatially and across channels:

The Spatial Reconstruction Unit (SRU) applies Group Normalization (GN) and a gating mechanism to selectively enhance spatial regions with high utility:

$X_{\text{out}} = \text{GN}(X)$

$W = \text{Gate}(\text{Sigmoid}(W_2(\text{GN}(X))))$

The Channel Reconstruction Unit (CRU) reduces channel redundancy by combining features through Global Weighted Convolution (GWC) and Partial Weighted Convolution (PWC), with adaptively pooled and softmax-weighted combination:

$Y = n_1 \cdot Y_1 + n_2 \cdot Y_2$

where $n_1, n_2$ reflect channel group weights. This two-stage reconstruction enhances inter-channel and spatial correlations, resulting in suppression of false positives and robust detection of small or occluded foreign objects.

4. Model Compression via Layer-Adaptive Pruning

The integration of MAKDF, RCFPN, and SC_Detect increases network complexity. To maintain real-time deployment feasibility, MRS-YOLO applies Layer-Adaptive Magnitude-based Pruning (LAMP):

$\text{score}(u;W) = \frac{ | W[u] | }{ \sum_{v \in u} | W[v] | }$

Connections with low LAMP scores are pruned, thereby effectively reducing model size. At a pruning rate of 0.5, parameters decrease from 2,582,932 to 1,442,144 (44.2%), and GFLOPs drop from 6.3 to 5.2 (17.5%), with only minor impact on accuracy.

5. Empirical Performance on RailFOD23

On the RailFOD23 dataset for railroad transmission line foreign object detection, MRS-YOLO demonstrates:

Model	mAP50 (%)	mAP50:95 (%)	Params (M)	GFLOPs
YOLO11n (baseline)	94.1	84.1	2.6	6.3
MRS-YOLO	94.8	86.4	1.4	5.2

These results represent improvements of +0.7 mAP50 and +2.3 mAP50:95, validating that enhanced architecture and pruning synergistically improve both fine- and coarse-grained detection metrics, while halving computational cost. This suggests high suitability for edge scenarios.

6. Application to Railroad Transmission Line Object Detection

MRS-YOLO is tailored for detection tasks involving plastic bags, balloons, birds’ nests, and debris within railroad scenarios. The MAKDF module supports diverse scale sensitivity; RCFPN ensures effective boundary and semantic fusion; SC_Detect optimizes representation in the presence of challenging backgrounds and object occlusion. The channel pruning ensures that deployment on computationally constrained devices (e.g., drones, cameras) is feasible without sacrificing accuracy. A plausible implication is greater reliability and speed for real-time monitoring systems in railway applications.

7. Summary and Significance

MRS-YOLO exemplifies integration of adaptive kernel convolutions (MAKDF), attention-based multi-level feature fusion (RCFPN), spatial-channel detect heads (SC_Detect), and layer-adaptive pruning (LAMP) as applied to a domain-specific challenge in railway environments. These coordinated enhancements result in increased accuracy and efficiency in foreign object detection. The algorithm’s architectural choices, supported by quantitative evidence on RailFOD23, distinguish MRS-YOLO as a robust, deployable solution for complex, scale-variant, and boundary-ambiguous detection scenarios (Liu et al., 12 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

MRS-YOLO Railroad Transmission Line Foreign Object Detection Based on Improved YOLO11 and Channel Pruning (2025)

Follow Topic

Get notified by email when new papers are published related to MRS-YOLO Algorithm.