Efficient Deployment of Neural Networks on Heterogeneous Devices: The GhostNet Approach
The deployment of convolutional neural networks (CNNs) on resource-constrained devices such as mobile phones remains a critical challenge. This paper introduces a novel architectural framework termed GhostNet, aimed at efficiently deploying CNNs on heterogeneous devices like CPUs and GPUs by leveraging feature map redundancy through cost-effective operations.
Design of Ghost Modules
The authors propose two main modules: CPU-efficient Ghost (C-Ghost) and GPU-efficient Ghost (G-Ghost). Both modules are designed to exploit the intrinsic redundancy in feature maps that typical CNNs generate.
C-Ghost Module: This module reduces computational overhead by separating an ordinary convolutional layer into two components. The first component generates intrinsic feature maps using conventional convolutions, albeit with a restricted number. The second component employs inexpensive linear operations to synthesize additional feature maps or "ghost" maps from the intrinsic maps, effectively approximating the original feature maps with reduced computation. This mechanism significantly decreases the parameters and FLOPs without compromising network performance.
G-Ghost Module: Tailored for GPU environments, this module addresses latency challenges associated with GPU-inefficient operations like depth-wise convolution. The G-Ghost design splits features into two paths: a primary path producing essential feature maps with fewer output channels and a secondary path generating redundant ghost features using cost-effective operations. This stage-wise strategy is inherently more suitable for GPUs due to lower memory operation ratios.
Performance and Implementation
Experiments on CIFAR-10 and ImageNet using the GhostNet architecture demonstrate impressive gains in efficiency and speed. When incorporated into existing architectures like VGG and ResNet:
- C-GhostNet achieves up to a 2x reduction in computation on standard metrics with negligible accuracy loss. For example, Ghost-ResNet50 achieves comparable accuracy to standard ResNet50 while reducing computation by over 40%.
- G-GhostNet enhances GPU-based speed significantly, attaining latency reductions of up to 16% on GPUs compared to baseline models, without sacrificing model accuracy.
Further, GhostNet's adaptability and plug-and-play nature make it applicable to diverse architectures, enhancing their efficiency across tasks such as object detection in MS COCO.
Implications and Future Applications
The paper makes substantive contributions to model efficiency, promising broad implications:
- Practical Deployability: GhostNet facilitates deployment on mobile and embedded systems by optimizing both computational and memory efficiencies.
- Architectural Flexibility: Its modular design allows seamless integration into different network backbones, offering potential optimizations for diverse applications beyond image classification.
- Theoretical Insights: The delineation of redundancy exploitation in feature maps could inspire further research into reducing unnecessary computations in deep learning models.
Moving forward, potential extensions of this work may include automating the design of Ghost modules for various architectures through neural architecture search or expanding the scope to other domains like natural language processing, where computation cost is a significant concern.
In conclusion, GhostNet represents a robust step toward sustainable, efficient neural network deployment on varying computational platforms, offering a compelling blend of theoretical innovation and practical utility.