Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection (1904.09730v1)

Published 22 Apr 2019 in cs.CV

Abstract: As DenseNet conserves intermediate features with diverse receptive fields by aggregating them with dense connection, it shows good performance on the object detection task. Although feature reuse enables DenseNet to produce strong features with a small number of model parameters and FLOPs, the detector with DenseNet backbone shows rather slow speed and low energy efficiency. We find the linearly increasing input channel by dense connection leads to heavy memory access cost, which causes computation overhead and more energy consumption. To solve the inefficiency of DenseNet, we propose an energy and computation efficient architecture called VoVNet comprised of One-Shot Aggregation (OSA). The OSA not only adopts the strength of DenseNet that represents diversified features with multi receptive fields but also overcomes the inefficiency of dense connection by aggregating all features only once in the last feature maps. To validate the effectiveness of VoVNet as a backbone network, we design both lightweight and large-scale VoVNet and apply them to one-stage and two-stage object detectors. Our VoVNet based detectors outperform DenseNet based ones with 2x faster speed and the energy consumptions are reduced by 1.6x - 4.1x. In addition to DenseNet, VoVNet also outperforms widely used ResNet backbone with faster speed and better energy efficiency. In particular, the small object detection performance has been significantly improved over DenseNet and ResNet.

Citations (319)

Summary

  • The paper introduces VoVNet with one-shot aggregation to overcome DenseNet's heavy energy and GPU computation inefficiencies.
  • It achieves double the detection speed and reduces energy consumption by 1.6 to 4.1 times compared to DenseNet counterparts.
  • The methodology maintains constant intermediate input sizes, effectively lowering memory access costs and computational overhead.

Insights into an Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection

The paper presents a novel approach to improving the efficiency of backbone networks for real-time object detection tasks. Traditionally, DenseNet has been deployed in these tasks due to its ability to reuse features with diverse receptive fields through dense connections, leading to high-performance benchmarks in object detection. However, despite its advantages, DenseNet suffers from substantial computational overhead and energy consumption, which are primarily attributed to heavy memory access costs resulted by the linearly increasing input channel sizes with network depth. The inefficiencies pose significant barriers to deploying DenseNet in real-time applications, necessitating an architectural rethink.

To address these challenges, the authors introduce VoVNet, a network architecture that incorporates the One-Shot Aggregation (OSA) module. The OSA module is structurally designed to replicate the positive feature aggregation of DenseNet while substantially mitigating its inefficiencies. Unlike DenseNet's intermediate dense connections, the OSA module circumvents redundant connections by aggregating all intermediate features only once in the final stage. This structural alteration maintains constant input sizes for intermediate layers, effectively reducing the memory access cost and computation overhead while improving GPU-computation efficiency.

The experimental validation of VoVNet, conducted through both lightweight and large-scale configurations, demonstrates its advantageous performance over DenseNet and ResNet baselines. For instance, the VoVNet-based detectors achieved double the speed and consumed 1.6 to 4.1 times less energy compared to their DenseNet counterparts. These outcomes are not only indicative of enhanced performance metrics but also highlight the practical viability in energy-critical and computation-constrained environments.

The theoretical considerations in this paper underscore the importance of rethinking feature aggregation strategies in convolutional neural networks. The move from dense intermediate aggregations to single-stage aggregation highlights the trade-offs between feature reuse and operational efficiency. The OSA's ability to maintain the diversification of features with multiple receptive fields offers a promising direction for future research, particularly given its demonstrated superiority in accurately detecting small objects.

Looking forward, VoVNet sets a foundation for developing networks that are computationally less intensive yet effective. As real-time applications become pervasive, the importance of energy and computation-efficient models will greatly increase, given the energy constraints in edge computing environments. The architectural shifts elucidated in this research could inspire future designs of network architectures that are not only efficient but also highly scalable to diverse computational platforms.

In sum, the proposed VoVNet architecture is a significant step toward efficient and practical deep learning models for real-time object detection, marking a thoughtful advancement in machine learning's application in energy-constrained computational environments. Future work could extend its principles to other neural network-based tasks such as semantic segmentation, emphasizing the scalability and adaptability of the proposed methods.