Papers
Topics
Authors
Recent
2000 character limit reached

YOLO-Drone with GhostHead

Updated 16 December 2025
  • YOLO-Drone with GhostHead is a lightweight object detection framework that uses ghost modules to optimize computation and detect small, low-contrast objects effectively.
  • Its architectural innovations include replacing standard convolutions with GhostConv variants, decoupled detection heads, and attention modules, achieving up to 40% reductions in parameters and FLOPs.
  • This approach is practical for real-time drone and aerial imagery tasks, offering robust performance in challenging conditions and enabling efficient deployment on resource-constrained platforms.

YOLO-Drone with GhostHead refers to a family of lightweight, high-performance object detectors based on the YOLO (You Only Look Once) architecture, where the detection "head" or, in several instances, the entire feature extraction pipeline is modified to include Ghost modules (GhostConv, C3Ghost, GhostBottleneck). These detectors are specifically optimized for scenarios where small or low-contrast objects, such as drones in aerial or infrared imagery, must be robustly and efficiently detected despite poor visibility, scale variation, or resource constraints. The GhostHead modification enables substantial parameter and computational savings by generating additional feature maps via cheap transformations of intrinsic feature maps produced by standard convolution, while reducing redundancy compared to conventional convolutional layers.

1. Design and Architectural Principles

Several instantiations of YOLO-Drone with GhostHead have been proposed, with integration points ranging from replacing only the detection head (as in YOLO-Drone for the VisDrone dataset (Jung, 14 Nov 2025)) to more extensive architectural revisions where backbone, neck, and head are all ghost-enhanced (as in GAANet for IR drone detection (Khan et al., 2023) and EGD-YOLOv8n for multimodal drone-bird discrimination (Sarkar et al., 12 Oct 2025)).

  • GhostConv (Ghost Convolution): Mathematical formulation is consistent across implementations. For input tensor XX and kernel ff:

    1. Compute mm intrinsic feature maps by

    Y=XfY' = X * f'

  1. For each intrinsic map yiy'_i, generate ss "ghost" feature maps by a deterministic linear operation Φi,j\Phi_{i,j}:

    yij=Φi,j(yi),i=1..m,  j=1..sy_{ij} = \Phi_{i,j}(y'_i), \quad i=1..m,\; j=1..s

  2. Concatenate intrinsic and ghost maps:

    Y=Concat(Y,{yij}i=1..m,j=1..s)Y = \operatorname{Concat}\left(Y',\,\{y_{ij}\}_{i=1..m,j=1..s}\right)

  • C3Ghost/C2f Blocks: Bottleneck modules such as YOLOv5/v11's C3 or C2f blocks are replaced by versions that utilize GhostConv and cross-stage partial connections, reducing both memory and runtime.
  • Multi-Scale Heads: Many variants (e.g., GAANet, GL-YOMO) introduce additional detection heads at finer spatial resolution (e.g., P2 for extra-small features), boosting recall for x-small object sizes.

2. Core Methodological Advancements

The collective impact of GhostHead modifications and associated enhancements can be categorized as follows:

  • Parameter and FLOP Reduction: Substituting standard convolutions with GhostConv or GhostBottlenecks typically lowers the parameter count and FLOPs by 30–50% in affected modules, with negligible or even positive effect on accuracy. For example, GL-YOMO reduces model size from 26M to 14M and FLOPs from 18.6 to 10.6 with ghosting and attention (Liu et al., 10 Oct 2024); EGD-YOLOv8n shrinks key layers by ∼40% (Sarkar et al., 12 Oct 2025).
  • Autonomous Anchor Adaptation: GAANet performs auto-anchor computation via K-means on label dimensions combined with genetic evolution to maximize CIoU recall, optimizing anchor box selection for tiny object scenarios (Khan et al., 2023).
  • Decoupled & Specialized Detection Heads: In EGD-YOLOv8n and GAANet, detection heads are decoupled into branches for classification and box regression, each leveraging ghost operations and, in EGD-YOLOv8n, deformable convolution for adaptive receptive fields (Sarkar et al., 12 Oct 2025).
  • Attention Mechanisms: EMA (Efficient Multi-scale Attention) and CPAM (Channel & Position Attention Module) are systematically applied in the backbone and neck in EGD-YOLOv8n and GL-YOMO, respectively, to sharpen representation of both RGB and IR cues (Sarkar et al., 12 Oct 2025, Liu et al., 10 Oct 2024).
  • Motion-Aware Fusion: GL-YOMO integrates frame-based motion analysis using template matching and Kalman filtering, supplementing low-confidence YOLO outputs to further improve tiny UAV recall rates (Liu et al., 10 Oct 2024).

3. Training Pipelines and Datasets

Key datasets include VisDrone (YOLO-Drone, 10 classes, >8k images), a custom multi-class IR drone dataset (GAANet: 5k+ images), the VIP Cup 2025 RGB–IR drone-vs-bird set (EGD-YOLOv8n: >45k pairs), and custom fixed-wing plus drone-bird video datasets (GL-YOMO: >100k frames) (Jung, 14 Nov 2025, Khan et al., 2023, Sarkar et al., 12 Oct 2025, Liu et al., 10 Oct 2024). Pipelines typically use the following strategies:

  • Input resizing: Standard input sizes are 640 × 640 for VisDrone/VIP Cup/GL-YOMO, but GAANet uses 265 × 256 tiles matching IR camera output.
  • Augmentation: Mosaic, HSV jitter, random flip, mixup, noise, and specialized sharpening/deblurring (EGD-YOLOv8n) address variance in imaging conditions.
  • Optimizers: AdamW (GAANet), SGD with momentum (EGD-YOLOv8n, GL-YOMO).
  • Losses: CIoU or GIoU for localization, BCE for objectness/class, DFL for regression discretization (YOLOv11), variable loss term weighting.

4. Quantitative Performance and Ablation

Performance gains attributed to GhostHead integration are consistently observed across studies:

Model Dataset [email protected] (%) Precision (%) Recall (%) Latency (ms) Params/FLOPs Improvement
YOLO-Drone (w/ GhostHead) VisDrone 30.4 (+0.5) 40.0 (+0.4) 31.5 (+0.6) 1.8 (-0.2) ~20% lower head FLOPs
EGD-YOLOv8n (“Fusion”) VIP CUP 2025 88.5 90.1 54.8 FPS 40% fewer key parameters
GAANet IR custom 97.6 (+2.5) 96.2 (+1.4) 90.2 (+2.3) 13.7 Full Ghost backbone/head
GL-YOMO (YOLO Detector) Drone-vs-Bird 88.0 91.0 81.1 21.6 FPS 40% FLOPs reduction

Ablation studies confirm that:

  • GhostConv/C3Ghost alone yields modest accuracy gains and large efficiency gains.
  • The addition of extra small-object heads boosts x-small target recall (\sim+13% in GAANet).
  • Attention blocks and deformable detection heads are synergistic with GhostConv for robust real-world detection (Khan et al., 2023, Sarkar et al., 12 Oct 2025, Liu et al., 10 Oct 2024).

5. Application Scenarios and Limitations

GhostHead-enabled YOLO-Drone models are widely adopted in high-altitude drone detection, drone vs. bird discrimination, night-time/IR detection, and fixed-wing UAV tracking. They are suitable for deployment on embedded GPUs and real-time edge platforms where memory and runtime constraints preclude large CNNs.

Reported limitations include:

  • Marginal gains relative to baseline on well-lit, easy datasets (YOLO-Drone gains are 0.5%\leq 0.5\% on VisDrone (Jung, 14 Nov 2025)).
  • Performance degradation under extreme low-light or highly cluttered backgrounds, motivating future research into transformer or hybrid attention integration (Jung, 14 Nov 2025).
  • Disentangling the contributions of each GhostHead submodule requires further isolated ablation (Jung, 14 Nov 2025, Sarkar et al., 12 Oct 2025).
  • Some gains are observed primarily in x-small or highly imbalanced object distributions.

6. Algorithmic Innovations and System Integration

The GhostHead approach is often embedded in broader detection frameworks that stack several detection and tracking mechanisms:

  • Multi-frame and ROI strategies: GL-YOMO’s dynamic global-local switching and motion-aware supplementation enables tracking of targets when YOLO’s static objectness drops below a threshold, critical for long-range or fast-moving UAVs (Liu et al., 10 Oct 2024).
  • Anchor adaptation: In GAANet, anchor boxes are evolved to optimally cover the empirical distribution of target bounding boxes using K-means and CIoU-driven genetic algorithms (Khan et al., 2023).

7. Outlook and Future Research

Emerging directions for YOLO-Drone with GhostHead research include:

  • Extending Ghost module integration throughout the entire network, not only the head (Jung, 14 Nov 2025).
  • Adding transformer- or context-augmented attention modules to further improve robustness under poor lighting or occlusion (Sarkar et al., 12 Oct 2025).
  • Multimodal fusion (RGB+IR, radar, etc.) in the input stem, along with adaptive attention (Sarkar et al., 12 Oct 2025).
  • Expansion to more complex and naturalistic detection settings, including swarms, formation tracking, and adversarial spoofing scenarios.

The GhostHead paradigm establishes a rigorous balance between performance and efficiency, and is API-accessible for practical deployment via the open-source repositories supervised by the original authors (Khan et al., 2023).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to YOLO-Drone with GhostHead.