CBNet: A Novel Composite Backbone Network Architecture for Object Detection (1909.03625v1)

Published 9 Sep 2019 in cs.CV

Abstract: In existing CNN based detectors, the backbone network is a very important component for basic feature extraction, and the performance of the detectors highly depends on it. In this paper, we aim to achieve better detection performance by building a more powerful backbone from existing backbones like ResNet and ResNeXt. Specifically, we propose a novel strategy for assembling multiple identical backbones by composite connections between the adjacent backbones, to form a more powerful backbone named Composite Backbone Network (CBNet). In this way, CBNet iteratively feeds the output features of the previous backbone, namely high-level features, as part of input features to the succeeding backbone, in a stage-by-stage fashion, and finally the feature maps of the last backbone (named Lead Backbone) are used for object detection. We show that CBNet can be very easily integrated into most state-of-the-art detectors and significantly improve their performances. For example, it boosts the mAP of FPN, Mask R-CNN and Cascade R-CNN on the COCO dataset by about 1.5 to 3.0 percent. Meanwhile, experimental results show that the instance segmentation results can also be improved. Specially, by simply integrating the proposed CBNet into the baseline detector Cascade Mask R-CNN, we achieve a new state-of-the-art result on COCO dataset (mAP of 53.3) with single model, which demonstrates great effectiveness of the proposed CBNet architecture. Code will be made available on https://github.com/PKUbahuangliuhe/CBNet.

Citations (234)

View on Semantic Scholar

Summary

The paper presents a composite backbone strategy that connects multiple identical backbones to enhance feature extraction for object detection.
It demonstrates improved performance, achieving up to a 3% boost in mean Average Precision on the MS-COCO benchmark.
The architecture integrates seamlessly with various detectors like Mask R-CNN and Cascade R-CNN, offering a cost-effective upgrade.

CBNet: A Novel Composite Backbone Network Architecture for Object Detection

The paper "CBNet: A Novel Composite Backbone Network Architecture for Object Detection" presents a significant contribution to the field of computer vision by introducing an innovative approach to enhance the performance of object detection models. This paper focuses on improving backbone networks, a critical component in CNN-based detectors responsible for basic feature extraction. It builds upon established backbones like ResNet and ResNeXt through a new strategy termed Composite Backbone Network (CBNet).

Architecture and Implementation

CBNet assembles identical backbones into a composite structure, utilizing composite connections to link the stages of adjacent backbones. Each backbone not only processes its inputs but also incorporates the output features from its predecessor. This iterative, stage-by-stage propagation enhances feature representation and improves detection outcomes.

Key features of the CBNet architecture include:

Multiple Identical Backbones: The architecture employs several backbones which are merged to form a more powerful network. The Lead Backbone delivers the final output for object detection.
Composite Connections: These connect the outputs of an Assistant Backbone's stages to the succeeding Lead Backbone's stages, thereby enriching the input features at each level.
Ease of Integration: CBNet can be seamlessly integrated into various state-of-the-art detectors such as FPN, Mask R-CNN, and Cascade R-CNN, providing improved performance across these models.

Empirical Evaluation

The paper presents empirical results demonstrating substantial improvements in mean Average Precision (mAP) on the MS-COCO benchmark. For instance, when integrated into the Cascade Mask R-CNN, CBNet achieved a mAP of 53.3, outperforming existing models. This improvement is attributed to CBNet's ability to effectively combine high-level and low-level features, enhancing the overall feature representation necessary for accurate object detection.

Comparison and Ablation Studies

The authors conducted comprehensive comparisons with alternative composite styles and assessed the impact of various numbers of backbones in CBNet. Findings indicate:

Performance Improvements: The CBNet architecture consistently increased model mAP by 1.5 to 3 percent across different detectors.
Optimal Configuration: While increasing the number of backbones continues to improve performance, practical considerations like speed and memory make Dual- and Triple-Backbone configurations recommended.
Efficient Feature Utilization: By sharing weights between backbones, CBNet maintains performance benefits with minimal parameter increase, highlighting the efficacy of its architectural innovation rather than mere parameter expansion.

Practical Implications and Future Directions

The proposed CBNet offers a practical solution to enhancing object detection without the need for expensive novel backbones or extensive pre-training. It leverages existing models to push the boundaries of performance with minimal overhead.

Future developments may focus on optimizing the computational efficiency further and adapting CBNet across diverse vision tasks beyond object detection. Additionally, exploring different backbone architectures and studying their integration within the CBNet framework could yield further insights into maximizing feature utilization and detection accuracy.

In conclusion, this paper exemplifies a methodical approach to improving a critical component of object detection systems, setting a new benchmark for subsequent advancements in the field.

PDF Markdown

Related Papers

GitHub

GitHub - VDIGPKU/CBNet_caffe: Composite Backbone Network (AAAI20) (408 stars)