CBNet: A Composite Backbone Network Architecture for Object Detection (2107.00420v7)

Published 1 Jul 2021 in cs.CV

Abstract: Modern top-performing object detectors depend heavily on backbone networks, whose advances bring consistent performance gains through exploring more effective network structures. In this paper, we propose a novel and flexible backbone framework, namely CBNetV2, to construct high-performance detectors using existing open-sourced pre-trained backbones under the pre-training fine-tuning paradigm. In particular, CBNetV2 architecture groups multiple identical backbones, which are connected through composite connections. Specifically, it integrates the high- and low-level features of multiple backbone networks and gradually expands the receptive field to more efficiently perform object detection. We also propose a better training strategy with assistant supervision for CBNet-based detectors. Without additional pre-training of the composite backbone, CBNetV2 can be adapted to various backbones (CNN-based vs. Transformer-based) and head designs of most mainstream detectors (one-stage vs. two-stage, anchor-based vs. anchor-free-based). Experiments provide strong evidence that, compared with simply increasing the depth and width of the network, CBNetV2 introduces a more efficient, effective, and resource-friendly way to build high-performance backbone networks. Particularly, our Dual-Swin-L achieves 59.4% box AP and 51.6% mask AP on COCO test-dev under the single-model and single-scale testing protocol, which is significantly better than the state-of-the-art result (57.7% box AP and 50.2% mask AP) achieved by Swin-L, while the training schedule is reduced by 6$\times$. With multi-scale testing, we push the current best single model result to a new record of 60.1% box AP and 52.3% mask AP without using extra training data. Code is available at https://github.com/VDIGPKU/CBNetV2.

Citations (148)

View on Semantic Scholar

Summary

The paper introduces the Composite Backbone Network (CBNet), which enhances object detection by integrating multi-level features from pre-trained backbones.
The paper demonstrates CBNet's efficiency with state-of-the-art AP scores on MS COCO benchmarks without requiring additional pre-training.
The paper underscores CBNet's scalability and compatibility with various architectures, offering a versatile and resource-efficient solution for object detection.

A Composite Backbone Network Architecture for Object Detection

The paper "CBNet: A Composite Backbone Network Architecture for Object Detection" presents a novel framework to enhance the performance of object detectors by utilizing a composite backbone architecture. The proposed method leverages existing pre-trained backbone networks, combining them to form a more effective feature extractor for object detection tasks.

Key Contributions

The primary contribution of this work is the introduction of the Composite Backbone Network (CBNet) architecture. CBNet groups multiple identical backbones, connecting them through composite connections. This composition strategy enhances the network's ability to integrate multi-level features, thereby expanding the receptive field progressively. The approach does not require additional pre-training, making it resource-efficient compared to traditional methods that simply increase the network's width or depth.

Architecture and Design

The CBNet framework includes several notable design elements:

Composite Connections: The architecture employs diverse composite strategies, such as Dense Higher-Level Composition (DHLC), to effectively combine features from multiple backbones.
Auxiliary Supervision: To improve training, auxiliary supervision is applied, which helps optimize the CBNet by providing additional regularization through auxiliary detection heads.
Pruning Strategy: To manage model complexity, a pruning strategy is introduced, selectively pruning stages without sacrificing accuracy.

These design components demonstrate CBNet's adaptability to various backbone and detector architectures, proving its generalization capacity.

Performance and Evaluation

In experimental evaluations on the MS COCO benchmark, the CBNet architecture significantly enhances the performance of mainstream detectors, such as Faster R-CNN and Mask R-CNN, without requiring additional computing resources for pre-training. Notably, the CB Swin-L achieves 59.4% box AP and 51.6% mask AP, surpassing state-of-the-art results while reducing training schedules sixfold. With multi-scale testing, it achieves 60.1% box AP and 52.3% mask AP, setting new benchmarks.

The results illustrate that CBNet outperforms existing methods by an impressive margin, thanks to its innovative use of composite architectures. This makes CBNet a valuable tool in pushing the accuracy ceiling for object detection solutions without imposing additional resource burdens.

Implications and Future Directions

The implications of CBNet are multifaceted:

Resource Efficiency: By constructing high-performance detectors using existing pre-trained models, CBNet avoids the high costs associated with training new models from scratch.
Scalability: The framework demonstrates strong scalability, allowing for extensions to larger composite networks, further enhancing detection accuracy while maintaining efficiency.
Compatibility: CBNet is compatible with both CNN and transformer-based architectures, providing a flexible solution for integrating into various machine learning pipelines.

Future developments in this domain may explore extending CBNet to other vision tasks beyond object detection, such as scene segmentation and video analysis. Additionally, further research could focus on refining the composition strategies and exploring alternative backbone architectures to broaden the applicability and performance of CBNet.

In summary, CBNet presents a compelling approach to advancing object detection technology through innovative use of composite network architectures, achieving substantial performance improvements with resource-conserving strategies.

PDF Markdown

Related Papers

GitHub

GitHub - VDIGPKU/CBNetV2: [TIP 2022] CBNetV2: A Composite Backbone Network Architecture for Object Detection (378 stars)