GhostHead Network for Drone Detection

Updated 21 November 2025

GhostHead Network is a lightweight detection head architecture that integrates GhostConv and C2f modules to reduce redundancy and enhance small object recognition.
It replaces standard convolutions with resource-efficient operations, achieving faster inference speeds and significant reductions in parameters and FLOPs.
The design maintains YOLOv11’s three-scale detection pipeline, demonstrating improved precision and mAP on challenging drone imagery datasets.

The GhostHead Network defines a lightweight detection head architecture developed as an enhancement to the YOLOv11 object detection framework, specifically designed for drone-based imagery characterized by high-altitude perspectives and small object sizes. It substitutes standard convolutional layers and bottleneck blocks with resource-efficient GhostConv modules and C2f blocks, aiming to reduce redundancy, improve detection accuracy—particularly for small objects—and accelerate inference without sacrificing the established multi-scale detection efficacy of YOLOv11 (Jung, 14 Nov 2025).

1. Motivation: Addressing Efficiency and Small Object Detection

Drone-acquired images pose distinct challenges: objects typically cover few pixels and are difficult to detect. The original YOLOv11 head, constructed using conventional Conv2d layers and a C3k2 bottleneck block, led to the following issues:

Redundant Feature Computation: Standard convolutions produced superfluous feature maps, inflating parameter and FLOP counts.
High Inference Cost: The heavier detection head contributed to slower inference speeds—suboptimal for onboard or real-time drone operations.
Subpar Small-Object Recognition: The generic head architecture was insufficiently specialized for the spatial and scale constraints prominent in drone images.

GhostHead was introduced to mitigate these limitations by substituting GhostConv and C2f modules, yielding computationally efficient representations conducive to improved small-object detection and faster inference (Jung, 14 Nov 2025).

2. Architectural Structure of GhostHead

GhostHead preserves YOLOv11's three-scale detection paradigm (feature maps at 80×80, 40×40, 20×20), but at each scale, replaces key head components as follows:

Backbone Output: Receives feature maps from the backbone of shape $H \times W \times C$ .
GhostConv: Replaces Conv( $k$ $k$ =3, $s$ $s$ =2). Each GhostConv comprises:
- A convolution (1×1 or 3×3) producing $m$ intrinsic feature maps.
- Channel-wise depthwise convolutions deriving $(n-m)$ "ghost" maps.
- The final output concatenates intrinsic and ghost maps, totaling $n$ channels.
C2f Block: Replaces the C3k2 (CSP bottleneck with two $k$ $k$ =2 kernels). The C2f process:
- Splits inputs into two branches.
- Each branch applies Conv( $k$ =1) → Conv( $k$ =3); a residual connection augments one branch.
- The outputs are concatenated post-processing.
Feature Fusion and Detection: Subsequent upsampling, concatenation, and detection ("yolo") layers remain congruent with YOLOv11.

This design delivers modularity and computational reduction while maintaining fidelity to the pipeline's original representational strengths (Jung, 14 Nov 2025).

3. Mathematical Underpinnings of Ghost Module

The GhostConv module achieves parameter and FLOP reduction through its two-stage output composition:

Standard Convolution: For input $X \in \mathbb{R}^{c \times h \times w}$ and output $Y \in \mathbb{R}^{h' \times w' \times n}$ ,

$Y = X * f + b$

with $P_{\text{std}} = c \cdot k \cdot k \cdot n$ params, $F_{\text{std}} = h' \cdot w' \cdot c \cdot k \cdot k \cdot n$ FLOPs.

Ghost Convolution:
- Intrinsic maps:
$Y' = X * f',\ f' \in \mathbb{R}^{c\times k\times k\times m}$ - Ghost map generation (using linear, e.g., depthwise, operations $\Pi$ ):

$V_{i,j} = \Pi_{i,j}(y'_i)$

for $i=1\ldots m,\ j=1\ldots s$ , $n=ms$ . - Output concatenation combines $Y'$ and ghost maps:

$Y = [y'_1, V_{1,2}, \ldots, V_{1,s}, \ldots, y'_m, V_{m,2}, \ldots, V_{m,s}]$ - Parameter and computational cost:

$P_{\text{ghost}} = c\cdot k\cdot k\cdot m + m\cdot d\cdot d$

$F_{\text{ghost}} = h' \cdot w' \cdot c\cdot k\cdot k\cdot m + h' \cdot w' \cdot m \cdot d \cdot d$

For $m \approx n/2$ and small $d$ , the resource consumption is nearly halved.

This explicit decomposition enables high output channel cardinality at a fraction of standard convolutional cost, crucial for real-time and edge-deployable models (Jung, 14 Nov 2025).

4. Pipeline Integration and Training Regimen

GhostHead is integrated exclusively into the head network of YOLOv11:

Pipeline Consistency: Retains the existing YOLOv11 Backbone (CSPDarkNet, SPPF, C2PSA) and three-scale detection structure (for small, medium, large targets).
Feature Fusion: Utilizes GhostConv in lieu of Conv, and C2f instead of C3k2 within the upsampling and scale-fusion logic.
Detection Layer: Anchor assignment, loss computation, and NMS are unaltered from the YOLOv11 baseline.

Training employed the VisDrone 2019 dataset (6,471 train, 548 val, 1,610 test, 10 classes), with input size 640×640 and the YOLOv11n configuration (depth=0.5, width=0.25). The procedure utilized the default YOLOv11 optimizer settings (SGD or AdamW, lr≈0.01, momentum≈0.9, weight decay≈5e-4), and loss functions spanned box regression (CIoU or GIoU loss), binary cross entropy for classification, and Distribution Focal Loss for bounding-box regression. Data augmentations and regularizations were as per YOLOv11 standards (Jung, 14 Nov 2025).

5. Quantitative Performance and Comparative Assessment

Relative performance metrics highlight the impact of the GhostHead substitution. The following tables summarize detection and efficiency outcomes:

YOLOv11n vs. YOLO-Drone (GhostHead):

Method	Precision (%)	Recall (%)	F1-Score (%)	Inference (ms)	[email protected] (%)
YOLOv11n	39.6	30.9	34.7	2.0	29.9
YOLO-Drone	40.0	31.5	35.2	1.8	30.4

Comparison with Other YOLO Variants:

Method	Precision (%)	Recall (%)	F1 (%)	GFLOPs	[email protected] (%)
YOLOv8	41.2	30.6	35.1	8.2	30.3
YOLOv9	41.0	30.0	34.6	7.9	30.1
YOLOv10	40.6	30.2	34.6	8.2	29.8
YOLOv11n	39.6	30.9	34.7	6.6	29.9
YOLO-Drone	40.0	31.5	35.2	6.7	30.4

Ablation confirmed that all observed improvements (Precision +0.4%, Recall +0.6%, [email protected] +0.5%, inference −0.2 ms) stem solely from the head replacement. No changes to the backbone were involved (Jung, 14 Nov 2025).

6. Advantages, Limitations, and Prospects

Empirical evidence demonstrates that GhostHead achieves:

Reduction of Head Module Overhead: A 50% decrease in parameters and FLOPs for the detection head.
Speed Enhancement: Inference improved from 2.0 ms to 1.8 ms.
Accuracy Boost: Notably, a 0.5% increase in small-object mAP, which is critical for drone scenarios.
Maintainance of Training and Detection Paradigm: The approach leverages three-scale detection and robust augmentation/regularization strategies as in YOLOv11.

Reported limitations:

Modifications are confined to the head; potential further efficiency and accuracy improvements may be available by extending Ghost modules into the backbone.
The scope of validation is limited to YOLOv11n and the VisDrone dataset. Generalization across model scales, datasets, or alternate tasks (such as instance segmentation) remains an open question.

Future research directions:

Apply GhostConv and C2f modules to larger YOLOv11 variants to assess scalability of the observed improvements.
Incorporate attention mechanisms (CBAM, PSA) with GhostHead for improved small-object focus.
Investigate dynamic parameterization of ghost ratios ( $m$ and $s$ ) for adaptive computation-accuracy balancing.
Extend the framework to video-based or multi-modal (RGB-thermal) drone detection scenarios (Jung, 14 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

YOLO-Drone: An Efficient Object Detection Approach Using the GhostHead Network for Drone Images (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to GhostHead Network.