RON: Reverse Connection with Objectness Prior Networks for Object Detection (1707.01691v1)

Published 6 Jul 2017 in cs.CV

Abstract: We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384X384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3X faster than the Faster R-CNN counterpart.

PDF Abstract

Overview of RON: Reverse Connection with Objectness Prior Networks for Object Detection

The paper presents a novel framework termed RON (Reverse Connection with Objectness Prior Networks), tailored for efficient and effective object detection. The framework seeks to bridge the gap between region-based methods like Faster R-CNN and region-free methodologies such as SSD, creating a hybrid model that combines the strengths of each approach. The authors address two core challenges in object detection: multi-scale localization and negative sample mining. This is achieved through "reverse connection" and an "objectness prior," respectively.

Methodology

RON's architecture employs a fully convolutional network that integrates both reverse connections and objectness prior.

Reverse Connections: This technique enables the model to leverage multi-level features from CNN layers for better handling of objects at various scales. This is crucial as different layers in a deep network capture different levels of abstraction—with earlier layers capturing fine-grained details and later layers capturing more semantic, abstract features. By creating reverse connections, RON facilitates the propagation of semantic information to earlier layers.
Objectness Prior: This component significantly reduces the search space by identifying potential object areas in the input image, thus alleviating issues related to the imbalance between object and non-object samples during training. Objectness prior maps are generated that provide guidance on where the probable objects are located, making the detection process more efficient.

Together, these components form a multi-task loss function that optimally combines reverse connections and objectness priors, allowing RON to predict detection results efficiently from diverse feature map locations.

Experimental Results

Extensive experiments were conducted on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO datasets to evaluate RON's performance. The results indicate that:

On PASCAL VOC 2007, with VGG-16 backbone and 384×384 resolution, RON achieved a mean Average Precision (mAP) of 81.3%, surpassing several contemporary approaches.
For MS COCO, the RON384++ variant delivered a competitive mAP of 27.4%, rating better than both SSD and Faster R-CNN at certain configurations.
The architecture demonstrated notable efficiency with a testing speed of 15 FPS using 1.5G GPU memory, which is approximately three times faster than the Faster R-CNN counterpart.

Implications and Future Directions

The proposed RON framework offers a new lens for viewing object detection architectures by melding aspects of both region-based and region-free approaches, potentially inspiring future models to further embrace hybrid designs. This research underscores the importance of exploiting multi-layer feature representations and refining object search through mechanisms like objectness prior, pointing to a path that could enhance real-time detection applications.

While RON has shown significant improvements over existing methods, particularly concerning speed and small object detection efficacy, there remain potential improvements and applied challenges. Future work could explore integrating more advanced backbone networks, enhancing robustness under diverse conditions, or extending RON's principles to other computer vision tasks. Additionally, the techniques proposed in this paper may benefit from further tuning and testing on an even broader set of image domains beyond those tested.

RON stands as a key contributor to the ongoing development of object detection methods, offering a robust, efficient, and effective solution that addresses current limitations in handling multi-scale object detection and negative sample mining.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Tao Kong (49 papers)
Fuchun Sun (127 papers)
Anbang Yao (33 papers)
Huaping Liu (97 papers)
Ming Lu (157 papers)
Yurong Chen (43 papers)

Citations (396)

View on Semantic Scholar

RON: Reverse Connection with Objectness Prior Networks for Object Detection (1707.01691v1)

Overview of RON: Reverse Connection with Objectness Prior Networks for Object Detection

Methodology

Experimental Results

Implications and Future Directions

Related Papers