Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GhostNets on Heterogeneous Devices via Cheap Operations (2201.03297v1)

Published 10 Jan 2022 in cs.CV

Abstract: Deploying convolutional neural networks (CNNs) on mobile devices is difficult due to the limited memory and computation resources. We aim to design efficient neural networks for heterogeneous devices including CPU and GPU, by exploiting the redundancy in feature maps, which has rarely been investigated in neural architecture design. For CPU-like devices, we propose a novel CPU-efficient Ghost (C-Ghost) module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed C-Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. C-Ghost bottlenecks are designed to stack C-Ghost modules, and then the lightweight C-GhostNet can be easily established. We further consider the efficient networks for GPU devices. Without involving too many GPU-inefficient operations (e.g.,, depth-wise convolution) in a building stage, we propose to utilize the stage-wise feature redundancy to formulate GPU-efficient Ghost (G-Ghost) stage structure. The features in a stage are split into two parts where the first part is processed using the original block with fewer output channels for generating intrinsic features, and the other are generated using cheap operations by exploiting stage-wise redundancy. Experiments conducted on benchmarks demonstrate the effectiveness of the proposed C-Ghost module and the G-Ghost stage. C-GhostNet and G-GhostNet can achieve the optimal trade-off of accuracy and latency for CPU and GPU, respectively. Code is available at https://github.com/huawei-noah/CV-Backbones.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kai Han (184 papers)
  2. Yunhe Wang (145 papers)
  3. Chang Xu (323 papers)
  4. Jianyuan Guo (40 papers)
  5. Chunjing Xu (66 papers)
  6. Enhua Wu (23 papers)
  7. Qi Tian (314 papers)
Citations (86)

Summary

Efficient Deployment of Neural Networks on Heterogeneous Devices: The GhostNet Approach

The deployment of convolutional neural networks (CNNs) on resource-constrained devices such as mobile phones remains a critical challenge. This paper introduces a novel architectural framework termed GhostNet, aimed at efficiently deploying CNNs on heterogeneous devices like CPUs and GPUs by leveraging feature map redundancy through cost-effective operations.

Design of Ghost Modules

The authors propose two main modules: CPU-efficient Ghost (C-Ghost) and GPU-efficient Ghost (G-Ghost). Both modules are designed to exploit the intrinsic redundancy in feature maps that typical CNNs generate.

C-Ghost Module: This module reduces computational overhead by separating an ordinary convolutional layer into two components. The first component generates intrinsic feature maps using conventional convolutions, albeit with a restricted number. The second component employs inexpensive linear operations to synthesize additional feature maps or "ghost" maps from the intrinsic maps, effectively approximating the original feature maps with reduced computation. This mechanism significantly decreases the parameters and FLOPs without compromising network performance.

G-Ghost Module: Tailored for GPU environments, this module addresses latency challenges associated with GPU-inefficient operations like depth-wise convolution. The G-Ghost design splits features into two paths: a primary path producing essential feature maps with fewer output channels and a secondary path generating redundant ghost features using cost-effective operations. This stage-wise strategy is inherently more suitable for GPUs due to lower memory operation ratios.

Performance and Implementation

Experiments on CIFAR-10 and ImageNet using the GhostNet architecture demonstrate impressive gains in efficiency and speed. When incorporated into existing architectures like VGG and ResNet:

  • C-GhostNet achieves up to a 2x reduction in computation on standard metrics with negligible accuracy loss. For example, Ghost-ResNet50 achieves comparable accuracy to standard ResNet50 while reducing computation by over 40%.
  • G-GhostNet enhances GPU-based speed significantly, attaining latency reductions of up to 16% on GPUs compared to baseline models, without sacrificing model accuracy.

Further, GhostNet's adaptability and plug-and-play nature make it applicable to diverse architectures, enhancing their efficiency across tasks such as object detection in MS COCO.

Implications and Future Applications

The paper makes substantive contributions to model efficiency, promising broad implications:

  • Practical Deployability: GhostNet facilitates deployment on mobile and embedded systems by optimizing both computational and memory efficiencies.
  • Architectural Flexibility: Its modular design allows seamless integration into different network backbones, offering potential optimizations for diverse applications beyond image classification.
  • Theoretical Insights: The delineation of redundancy exploitation in feature maps could inspire further research into reducing unnecessary computations in deep learning models.

Moving forward, potential extensions of this work may include automating the design of Ghost modules for various architectures through neural architecture search or expanding the scope to other domains like natural language processing, where computation cost is a significant concern.

In conclusion, GhostNet represents a robust step toward sustainable, efficient neural network deployment on varying computational platforms, offering a compelling blend of theoretical innovation and practical utility.