Papers
Topics
Authors
Recent
2000 character limit reached

GhostHGNetv2: Efficient Backbone for Detection

Updated 16 November 2025
  • The paper introduces GhostHGNetv2, which replaces heavy 3×3 convolutions with GhostConv modules to cut computational cost by nearly 50% while maintaining accuracy.
  • GhostConv achieves efficiency by generating intrinsic features with 1×1 convolutions and producing additional maps via cost-effective depthwise operations.
  • Hierarchical multi-scale residual fusion integrates features from multiple stages, enhancing detection accuracy and throughput in resource-constrained environments.

GhostHGNetv2 is an efficient backbone architecture designed to balance computational cost and multi-scale feature representational power, primarily for real-time object detection tasks in resource-constrained environments. It emanates from replacing every standard convolution in the HGNetv2 backbone by GhostConv modules and introducing hierarchical multi-scale residual fusion. The resulting design achieves significantly reduced FLOPs and parameters while maintaining or improving accuracy and throughput, as demonstrated in anomaly-behavior monitoring settings (Zheng et al., 10 Mar 2025).

1. Architectural Formulation

GhostHGNetv2 is organized into four principal stages of GhostHGBlocks, which replace standard convolutional blocks and are separated by spatial down-sampling via depthwise convolutions. The pipeline is prefaced by a stem block (Conv3×3, stride 2; BN; ReLU), and terminates in a Spatial Pyramid Pooling Fast (SPPF) module. The spatial dimensions halve and the channel count doubles from stage to stage:

  • Input Stem: 640×640×3Conv3×3320×320×16640 \times 640 \times 3 \xrightarrow{\mathrm{Conv3\times3}} 320 \times 320 \times 16
  • Stage 1: Four Ghost-enhanced branches (16×320×32016 \times 320 \times 320), concatenated and compressed (32×320×32032 \times 320 \times 320), with residual addition.
  • Stage 2–4: Each block exhibits four GhostConvBNAct sub-blocks, concatenation, GhostConv compression, and residual addition, with downsampling following each stage (160×160×32 → 80×80×64 → 40×40×128 → 40×40×256).
  • SPPF: Applies parallel 5×5 max-pools, concatenation, and 1×1 GhostConv on final maps.

Outputs from stages 2, 3, and 4 feed into the neck, supporting multi-scale feature propagation.

2. GhostConv: Mechanism and Efficiency

GhostConv achieves computational efficiency by generating a subset of “intrinsic” features via narrow 1×1 convolution, then producing the remaining feature maps using inexpensive linear transformations (depthwise/group convolutions). The formal constructs are:

  • Intrinsic feature: Y0=XW(1×1)Rm×h×wY_0 = X * W^{(1\times1)} \in \mathbb{R}^{m\times h'\times w'}
  • Ghost features: Yi=Φi(Y0)Y_i = \Phi_i(Y_0) for i=1,,s1i=1,\ldots,s-1; typically via small-depthwise convolutions
  • Output: Y=Concat(Y0,Y1,...,Ys1)R(sm)×h×wY = \mathrm{Concat}(Y_0, Y_1, ..., Y_{s-1}) \in \mathbb{R}^{(s\,m)\times h'\times w'}, where smns m \approx n

FLOPs are then: FLOPSstd=nchwk2FLOPSghost=mchw+(s1)mhwd2\mathrm{FLOPS}_\text{std} = n\,c\,h'w'k^2 \qquad \mathrm{FLOPS}_\text{ghost} = m\,c\,h'w' + (s-1)m\,h'w'd^2 Choosing m=n/sm=n/s yields roughly $1/s$ the standard convolution cost.

Replacement of 3×3 convolutional layers with GhostConvBNAct modules throughout HGNetv2 is structurally trivial and reduces redundant computation by 50%.

3. Hierarchical Multi-Scale Residual Fusion

Each GhostHGBlock internally concatenates and compresses multi-branch features: Z=Concat[Y0,,YL1],Z=GhostConv1×1(Z)Z = \mathrm{Concat}[Y_0,\ldots,Y_{L-1}], \quad Z' = \mathrm{GhostConv}_{1\times1}(Z) The block output is the normalized sum of ZZ' and block input via ReLU and BatchNorm.

At a cross-stage (hierarchical) level, outputs of stages 1–4 (F1,,F4F_1,\ldots,F_4) undergo 1×1 convolution (ψi\psi_i), then resizing (via up/down-sampling) to a shared grid size (H×WH\times W): Fhf=i=14Resize(ψi(Fi))F_{\text{hf}} = \sum_{i=1}^4 \mathrm{Resize}\bigl(\psi_i(F_i)\bigr) This approach integrates coarse-to-fine information at minimal cost, augmenting the receptive field without heavy convolutions at every scale.

4. Network Complexity and Resource Requirements

Comparative complexity and resource metrics for GhostHGNetv2 versus the original HGNetv2 backbone (using YOLOv8-n scale equivalents):

Backbone Parameters (M) FLOPs (GFLOPs) FPS (CPU)
YOLOv8-n (baseline) 6.2 8.9 33
HGNetv2 6.9 6.9 54
GhostHGNetv2 4.6 4.3 56

GhostHGNetv2 thus achieves approximately a 50% reduction in both parameter count and FLOPs (from 8.9M/8.9G to 4.6M/4.3G). The efficient hierarchical fusion via 1×1 convolutions and interpolation further minimizes computational overhead.

5. Empirical Performance and Validation

In anomaly-behavior detection (fall, fight, smoke), GhostHGNetv2 underpins HGO-YOLO, yielding enhanced metrics:

  • [email protected]: 87.4%
  • Recall: 81.1%
  • Inference rate: 56 FPS (single CPU)
  • Model size: 4.6 MB, 4.3 GFLOPs
  • Outperforms YOLOv8n by +3.0% mAP, −51.7% FLOPs, 1.7× speedup.
  • On Jetson Orin Nano: stable 42 FPS.

Ablation studies isolate the GhostConv contribution: integrating GhostHGBlocks into HGNetv2 accounts for a +1.2 pt mAP gain and a 48% decrease in FLOPs, validating the efficacy of cheap feature map generation strategies in practical deployments.

6. Context and Implications

The GhostHGNetv2 backbone targets scenarios where real-time inference is necessary under strictly bounded hardware resources, such as anomaly detection on edge devices. Adoption of GhostConv operations maintains the diversity and depth of feature extraction, while hierarchical fusion preserves spatial context without incurring the cost of full dense convolutions at every scale.

A plausible implication is that the architectural principle—replacing heavy convolutions and embracing multi-scale fusion via light linear transforms—can be extended to other detection and segmentation networks requiring aggressive resource constraints.

7. Integration and Significance in Modern Detection Frameworks

GhostHGNetv2 is demonstrably compatible with contemporary lightweight heads such as OptiConvDetect (partial parameter sharing across classification/regression branches), further slashing head FLOPs by 41% without any reported accuracy loss.

By enabling high-throughput, accurate, and efficient object detection, GhostHGNetv2 contributes a robust architectural alternative to conventional convolutional backbones in practical anomaly monitoring systems (Zheng et al., 10 Mar 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to GhostHGNetv2.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube