Papers
Topics
Authors
Recent
2000 character limit reached

LiM-YOLO: Efficient Ship Detection Framework

Updated 17 December 2025
  • The paper introduces a data-driven pyramid level shift and a normalized auxiliary branch to address scale disparities and improve ship detection accuracy in optical remote sensing.
  • The architecture reconfigures YOLO by replacing the standard P3–P5 detection heads with a P2–P4 setup, eliminating redundant layers and reducing computational cost.
  • Empirical evaluations show that LiM-YOLO achieves state-of-the-art mAP and F1 scores, particularly for small vessels, while maintaining a low model size and GFLOPs.

LiM-YOLO (Less is More with Pyramid Level Shift and Normalized Auxiliary Branch) is a high-resolution ship-detection framework based on YOLO, explicitly designed to address the extreme scale disparities, morphological anisotropy, and spatial aliasing challenges present in optical remote sensing imagery. The architecture is derived from the Ultralytics YOLOv9-E baseline, introducing a pyramid level shift for detection, data-driven selection of detection strides, and a specialized, normalization-stabilized auxiliary training branch. Through empirical evaluation on satellite ship datasets, LiM-YOLO demonstrates state-of-the-art detection accuracy, particularly for small and narrow vessels, while maintaining low computational cost and model size (Kim et al., 10 Dec 2025).

1. Architectural Overview and Detection Head Reconfiguration

LiM-YOLO is constructed atop the YOLOv9-E object detector, which utilizes a CSP-style backbone with programmable gradient information (PGI) blocks and a feature-pyramid neck (FPN-like) for multi-scale feature fusion. The backbone integrates GELAN for deep feature extraction, while the neck in standard deployments aggregates feature maps at strides of 8 (P3), 16 (P4), and 32 (P5).

LiM-YOLO fundamentally reconfigures the detection head and backbone:

  • The detection heads operate at P2 (stride 4), P3 (8), and P4 (16), replacing the standard P3–P5 configuration.
  • Each detection head comprises a 1×11\times 1 convolution, bifurcated into class and oriented bounding box (OBB) regression outputs.
  • All P5-stage components are pruned from the backbone and neck to reduce both parameter count and GFLOPs, eliminating layers whose effective receptive field (ERF) encompasses no additional discriminative context for ship detection in 1024×10241024 \times 1024 inputs. Figure 1 in the paper details the full architecture.

To enhance training stability, an auxiliary branch is included: a CBLinear (channel-wise 1×11\times 1 linear convolution) is retained in each PGI block during training, later detached during inference.

2. Pyramid Level Shift and Sampling Theory Foundation

The principal innovation in LiM-YOLO is the data-driven pyramid level shift, justified by an analysis of object scale distributions and sampling criteria:

  • Ship-scale statistics (Table 2) indicate mean minor axes of approximately 17.34 px (with a 95% range of 4–64 px) and major axes around 70.24 px (95% range 8–256 px).
  • The minor-axis occupancy ratio, ρminor=Lminor/S\rho_{\text{minor}} = L_{\text{minor}} / S, averages 0.54\approx 0.54 when using conventional P5 stride S=32S=32, indicating sub-pixel aliasing for the bulk of objects.
  • Applying Shannon's sampling theorem, resolution of widths down to Lminor4L_{\text{minor}} \approx 4 px mandates detection at a grid stride S4S \leq 4 (P2).

Ablation analysis of theoretical (TRF) and effective (ERF) receptive fields (Table 3, Figure 2) demonstrates that the ERF at P4 (1024\approx 1024 px) already covers the entirety of the highest-resolution input, rendering deeper layers (P5, S=32S=32) redundant for contextual aggregation.

The final adopted detection pyramid thus consists of the higher-resolution P2 head to guarantee ρminor1\rho_{\text{minor}} \geq 1, and prunes P5 to improve efficiency and preserve detail for the smallest vessels.

3. Group Normalized CBLinear for Stable Auxiliary Supervision

The CBLinear auxiliary branch embedded in each PGI block is designed to preserve mutual information through a linear projection, maintaining reversible information flow. However, the typical use of BatchNorm becomes unstable for micro-batch settings endemic to large 1024×10241024 \times 1024 imagery, due to high variance in estimated moments.

To address this, LiM-YOLO introduces Group Normalization (GN, with group count G=32G=32) immediately before the CBLinear 1×11\times 1 convolution, without non-linear activations to retain reversibility. The process is defined for each group gg by:

  • μg=mean(Fg)\mu_g = \mathrm{mean}(F_g)
  • σg2=Var(Fg)\sigma_g^2 = \mathrm{Var}(F_g)
  • F^g=(Fgμg)/σg2+ϵ\hat{F}_g = (F_g-\mu_g)/\sqrt{\sigma_g^2 + \epsilon}
  • GN-CBLinear(F)=Conv1×1(γF^+β)\text{GN-CBLinear}(F) = \text{Conv}_{1\times 1}(\gamma\hat{F}+\beta)

This GN-CBLinear block (Eq 5) robustly stabilizes gradient propagation through the auxiliary branch and improves convergence with batch-independent normalization.

4. Training Regimen and Dataset Protocol

Training is performed on high-resolution 1024×10241024 \times 1024 image patches:

  • Sliding-window cropping is applied with a 256 px overlap for training and no overlap during validation. Blank image patches are discarded.
  • Zero-padding is applied to images smaller than 1024×10241024 \times 1024.
  • Due to memory constraints (48 GB NVIDIA A6000), the batch size is set to 2.
  • Optimization utilizes Adam with an initial learning rate of 0.001, cosine annealing to 0.0001 over 100 epochs.
  • Loss functions follow the YOLOv9 default: binary cross-entropy for classes/objectness, and IoU-based regression (GIoU or CIoU) plus an angle-offset loss for OBBs.

5. Empirical Results, Ablations, and Comparative Performance

LiM-YOLO is validated on SODA-A, DOTA-v1.5, FAIR1M-v2.0, and ShipRSImageNet-V1, all with oriented bounding box ship annotations. Summarized findings include:

Ablation Studies (Tables 6–9):

  • Baseline YOLOv9-E (P3–P5): mAP5095_{50–95} ≈ 0.637 / 0.736 / 0.285 / 0.414.
  • Adding the P2 head (P2–P5): minor gain in mAP, but +17% GFLOPs.
  • Shifting to P2–P4 (pruned P5): mAP5095_{50–95} increases to 0.660 / 0.744 / 0.290 / 0.428; parameters reduced by 64%, GFLOPs reduced by 3.5%.
  • Dropping P4 (P2–P3 only) collapses detection of large objects.
  • Incorporating GN-CBLinear into the auxiliary branch (P2–P4 + GN): further absolute mAP gain of +0.002–0.018.

State-of-the-Art Benchmarking (Table 10):

  • Compared against YOLOv8x, YOLOv10x, YOLO11x, YOLOv12x, RT-DETR-X on the Integrated Ship Detection dataset.
  • LiM-YOLO: 21.16M parameters (smallest), 189.4 GFLOPs, 26.7 ms/image inference, mAP50_{50}=0.832, mAP5095_{50–95}=0.600 (best), F1=0.791, Precision=0.839, Recall=0.748.
  • Next closest mAP5095_{50–95} is 0.566 (YOLOv8x).

Class-wise improvements (Table 11, Figure 3):

  • Small classes (e.g., Motorboat, Sailboat) gain up to +6–12 mAP5095_{50–95} points due to the presence of the P2 head.
  • Large classes (e.g., Aircraft Carrier) retain high detection, as ERF(P4) suffices.
  • GN-CBLinear notably recovers mid-range categories and yields best average mAP5095_{50–95}=0.448 versus baseline 0.414.

Qualitative Results (Figure 4):

  • Detects small or occluded vessels missed by baseline models.
  • Resolves cases of closely packed or omitted ships across all datasets.

6. Relevance, Implications, and Limitations

LiM-YOLO directly addresses the limitations of traditional object detection pyramids for high-resolution, small-object tasks, establishing that deeper, larger-stride layers are not necessary for remote sensing contexts dominated by small and mid-size targets. The pyramid level shift and auxiliary branch normalization provide a basis for efficient, accurate single-stage detectors in dense, anisotropic scenarios.

The model’s careful alignment of detection grid with empirical object size statistics, along with rigorous application of Nyquist sampling theory, represents a methodological advancement for cross-domain object detection. Potential extensions may apply similar strategies to other domains where object scale statistics and imaging constraints diverge from general-purpose benchmarks.

A plausible implication is that further refinement of detection head placement—driven by precise measurement of object size distributions and context requirements—remains an underexplored axis for domain-specific detector efficiency.

When contrasted with approaches such as LAM-YOLO—which augments the YOLOv8 framework for drone-based detection of small objects using advanced attention mechanisms, extra auxiliary heads for finer scales, and specialized IoU loss functions (Zheng et al., 1 Nov 2024)—LiM-YOLO’s unique contribution is the principled, data-driven reduction and realignment of the detection pyramid for high-resolution imagery. Whereas LAM-YOLO primarily deploys architectural and attention augmentations to counter occlusion and lighting variations, LiM-YOLO’s improvements stem from fundamental analysis of scale and sampling in remote sensing images, providing a complementary perspective on specialized object detector design.

References

  • “LiM-YOLO: Less is More with Pyramid Level Shift and Normalized Auxiliary Branch for Ship Detection in Optical Remote Sensing Imagery” (Kim et al., 10 Dec 2025)
  • “LAM-YOLO: Drones-based Small Object Detection on Lighting-Occlusion Attention Mechanism YOLO” (Zheng et al., 1 Nov 2024)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to LiM-YOLO.