Papers
Topics
Authors
Recent
Search
2000 character limit reached

GhostConv: Efficient Convolution Module

Updated 18 January 2026
  • GhostConv is a convolution module that decomposes standard operations into a lightweight intrinsic stage and an efficient ghost feature generation stage.
  • It significantly reduces parameters and FLOPs by combining 1x1 convolutions with inexpensive depthwise convolutions.
  • GhostConv has been successfully integrated into architectures like GRAN and YOLO11-4K, improving performance in super-resolution and high-resolution object detection.

GhostConv is a computationally efficient convolutional module that decomposes the standard convolutional operation into two stages: extraction of a reduced set of intrinsic feature maps using a lightweight convolution (typically 1×11 \times 1), followed by generation of the remaining (ghost) feature maps using inexpensive linear transformations such as depthwise convolutions. Originally introduced to address feature redundancy and high computational cost in convolutional neural networks (CNNs), GhostConv underpins modern efficient architectures in tasks such as single-image super-resolution and real-time object detection on high-resolution images (Niu et al., 2023, Hafeez et al., 18 Dec 2025).

1. Mathematical Formulation and Core Principle

Let X∈RC×H×WX \in \mathbb{R}^{C \times H \times W} denote an input feature tensor, with CC channels and spatial dimensions H×WH \times W. The standard convolution with NN output channels and kernel size KK is defined as: Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K} The parameter count is N⋅C⋅K2N \cdot C \cdot K^2, and runtime complexity is similarly proportional to NCK2HWN C K^2 H W.

GhostConv modifies this using an expansion ratio qq (or X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}0 in some notations). The process consists of:

  • Stage 1: Compute X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}1 "intrinsic" feature maps using a X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}2 convolution:

X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}3

  • Stage 2: For each intrinsic map X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}4, generate X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}5 total output maps via linear operators X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}6:

X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}7

X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}8 is typically the identity; the remaining operators are inexpensive, such as depthwise convolutions with kernel size X∈RC×H×WX \in \mathbb{R}^{C \times H \times W}9.

The output CC0 is assembled by stacking all ghost and intrinsic maps: CC1

2. Parameter and Computational Efficiency

By replacing the full convolution with a bottlenecked intrinsic convolution plus cheap operators, GhostConv reduces both parameter and compute cost:

  • Parameter count:

CC2

For typical settings with CC3 or CC4, CC5, and CC6, the reduction can be over CC7(Niu et al., 2023).

  • FLOPs:

CC8

For CC9, H×WH \times W0, H×WH \times W1, H×WH \times W2, the comparison is: - Standard: H×WH \times W3 parameters - GhostConv: H×WH \times W4 parameters (H×WH \times W5 of standard), with similar FLOPs reduction.

  • Empirical results: GRAN achieves a H×WH \times W6 reduction in parameters and H×WH \times W7 reduction in FLOPs compared to RCAN on single-image super-resolution tasks, with negligible quality loss(Niu et al., 2023).

3. Implementation Variants and Instantiations

In GRAN for Super-Resolution

GhostConv is embedded within Ghost Residual Attention Blocks (GRAB). Each GRAB applies:

  1. GhostConv (H×WH \times W8)
  2. ReLU activation
  3. Second GhostConv
  4. Channel-and-Spatial Attention Module (CSAM)
  5. Residual addition

GRABs are grouped into Ghost Residual Groups (GRGs), and the network structure further integrates skip connections at both group and global levels(Niu et al., 2023).

In YOLO11-4K for 4K Object Detection

GhostConv replaces several early H×WH \times W9 convolutions in the backbone. Here:

  • The "ghost ratio" NN0 is often set to 2 (half channels intrinsic, half ghost).
  • Cheap operators NN1 are NN2 depthwise convolutions.
  • The output is constructed as NN3 with cropping as necessary to hit the desired number of channels(Hafeez et al., 18 Dec 2025).

Pseudocode Example (as used in YOLO11-4K)

Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}9 (Hafeez et al., 18 Dec 2025)

4. Empirical Evaluation and Practical Impact

Both GRAN and YOLO11-4K demonstrate that GhostConv can significantly decrease model size and latency with minimal performance trade-offs.

  • In GRAN (super-resolution, Set5, NN4 upscaling):
    • RCAN: NN5M params, NN6G FLOPs, PSNR NN7, SSIM NN8
    • GRAN: NN9M params, KK0G FLOPs, PSNR KK1, SSIM KK2
  • In YOLO11-4K (object detection, 4K panoramic images):

A plausible implication is that GhostConv enables practical deployment of CNN-based vision systems in resource-constrained or real-time environments by reducing computational demands without compromising accuracy.

5. Design Guidelines and Generalization

GhostConv serves as a drop-in replacement for standard convolutional layers, especially in spaces where channel-wise redundancy is present. Design guidelines include:

  • Select the ghost ratio KK9 or Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}0 so that Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}1 or Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}2 is integer-valued.
  • Use Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}3 convolutions for intrinsic map extraction.
  • Employ depthwise convolutions with small kernels (e.g., Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}4, Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}5) for ghost feature generation.
  • In extremely resource-sensitive contexts, increasing Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}6 further reduces parameters and latency; both scale as Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}7.
  • GhostConv layers integrate natively with existing attention mechanisms, normalization, and residual connections. (Niu et al., 2023, Hafeez et al., 18 Dec 2025)

6. Applicability, Limitations, and Extensions

GhostConv has been deployed in super-resolution and high-resolution object detection networks, but the principle is broadly applicable to CNN architectures where feature redundancy is suspected. When used appropriately, GhostConv can provide near state-of-the-art accuracy with an order-of-magnitude savings in both parameters and FLOPs. Limitations may arise in contexts where information loss due to reduction in learned filters cannot be compensated by cheap linear operators; model designers should tune Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}8 and kernel sizes in accordance with dataset complexity (Niu et al., 2023, Hafeez et al., 18 Dec 2025).


References

  • GRAN: Ghost Residual Attention Network for Single Image Super Resolution (Niu et al., 2023)
  • YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images (Hafeez et al., 18 Dec 2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GhostConv.