Object Detection Networks on Convolutional Feature Maps (1504.06066v2)

Published 23 Apr 2015 in cs.CV

Abstract: Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep convolutional architectures. The object classifier, however, has not received much attention and many recent systems (like SPPnet and Fast/Faster R-CNN) use simple multi-layer perceptrons. This paper demonstrates that carefully designing deep networks for object classification is just as important. We experiment with region-wise classifier networks that use shared, region-independent convolutional features. We call them "Networks on Convolutional feature maps" (NoCs). We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier. We show by experiments that despite the effective ResNets and Faster R-CNN systems, the design of NoCs is an essential element for the 1st-place winning entries in ImageNet and MS COCO challenges 2015.

Citations (399)

View on Semantic Scholar

Summary

The paper demonstrates that a deeper convolutional region-wise classifier significantly boosts detection accuracy, as evidenced by improvements in VGG-16 models.
Experiments reveal that convolutional classifiers outperform MLP-based ones, effectively reducing localization errors in object detection tasks.
The integration of NoC designs with robust backbones like ResNet paves the way for developing modular and efficient object detectors in future research.

An Analysis of Object Detection Networks on Convolutional Feature Maps

The paper "Object Detection Networks on Convolutional Feature Maps" presents a comprehensive exploration of architectural designs for object detection systems, emphasizing the convergence of convolutional networks (ConvNets) and region-wise classifiers. The authors scrutinize the conventional two-component structure of object detectors—the feature extractor and the object classifier—and propose novel insights and methodology for enhancing detection accuracy. Their research is centered on what they term as "Networks on Convolutional feature maps" (NoCs), which deftly utilize shared convolutional features along with specifically designed per-region classifiers.

Key Contributions and Observations

The authors investigate the region-wise classifier's architectural depth, presenting three primary observations:

Importance of Classifier Depth: The paper emphasizes that the improvement in detection accuracy does not solely rely on improved convolutional features but equally on a deeper region-wise classifier. Conventional architectures similar to AlexNet and VGG with multiple fully-connected (fc) layers are contrasted against GoogleNet and ResNet architectures with no hidden fc layers.
Convolutional vs. MLP-based Classifiers: Experiments reveal that a convolutional region-wise classifier significantly improves detection performance compared to a multi-layer perceptron (MLP), underscoring the importance of convolution in extracting region-specific features.
Localization and Recognition Evaluation: The paper assesses localization errors versus recognition errors, showing that the adoption of deeper convolutional networks mitigates localization errors substantially.

Experimental Insights

The empirical evaluations are conducted on the PASCAL VOC and MS COCO datasets. Through detailed ablation studies, the research delineates the effectiveness of NoCs. With VGG-16 models, a substantial mAP boost is achieved by employing deeper and convolutional classifiers, particularly visible in the comparisons with other state-of-the-art methods like Fast R-CNN and Faster R-CNN.

On MS COCO, the paper illustrates the crucial role of NoC designs in the performance of Faster R-CNN combined with ResNet architectures, attributing gains primarily to better region-wise classification rather than additional feature layers.

Implications and Future Scope

The findings imply that the development of robust and deep classifiers can provide further improvements in object detection accuracy, particularly when combined with proficient feature extractors like ResNet. This research suggests a promising direction for future work in the creation of modular convolutional feature systems adaptable across various network architectures.

Anticipating future developments, the exploration of diverse forms of maxout and adaptive pooling could further refine the extraction and classification of regional features. Moreover, integration with emerging detection frameworks could unveil new levels of efficiency and performance.

Overall, this paper provides significant contributions to the understanding of the balance between existing high-capacity backbones and innovative classifier designs, pushing forward the boundaries of object detection technologies.

PDF Markdown