Recurrent Scale Approximation for Object Detection in CNN (1707.09531v2)

Published 29 Jul 2017 in cs.CV

Abstract: Since convolutional neural network (CNN) lacks an inherent mechanism to handle large scale variations, we always need to compute feature maps multiple times for multi-scale object detection, which has the bottleneck of computational cost in practice. To address this, we devise a recurrent scale approximation (RSA) to compute feature map once only, and only through this map can we approximate the rest maps on other levels. At the core of RSA is the recursive rolling out mechanism: given an initial map at a particular scale, it generates the prediction at a smaller scale that is half the size of input. To further increase efficiency and accuracy, we (a): design a scale-forecast network to globally predict potential scales in the image since there is no need to compute maps on all levels of the pyramid. (b): propose a landmark retracing network (LRN) to trace back locations of the regressed landmarks and generate a confidence score for each landmark; LRN can effectively alleviate false positives caused by the accumulated error in RSA. The whole system can be trained end-to-end in a unified CNN framework. Experiments demonstrate that our proposed algorithm is superior against state-of-the-art methods on face detection benchmarks and achieves comparable results for generic proposal generation. The source code of RSA is available at github.com/sciencefans/RSA-for-object-detection.

Citations (89)

View on Semantic Scholar

Summary

The paper introduces Recurrent Scale Approximation (RSA), a novel method that efficiently approximates feature maps at different scales from a single initial computation to reduce computational cost in CNN object detection.
A Scale-Forecast Network predicts likely object scales in an image, enabling the system to focus computational resources efficiently and avoid unnecessary calculations.
Experiments demonstrate that RSA achieves competitive detection accuracy on benchmarks while significantly reducing computational load compared to traditional multi-scale detection approaches.

An Overview of "Recurrent Scale Approximation for Object Detection in CNN"

This paper presents a method to address scale variations in object detection using CNNs, focusing on reducing computational costs traditionally associated with multi-scale detection approaches. The authors introduce a novel technique named Recurrent Scale Approximation (RSA), designed to improve upon the typical feature map generation methods used in CNN-based object detection tasks.

Core Contributions

The main contribution of the paper is the RSA mechanism, which allows for a one-time computation of feature maps at a given scale, and then utilizes these to approximate feature maps at smaller scales using a recursive approach. This design intends to mitigate the computational burden of multi-scale detection, where typically, feature maps must be recalculated for each scale. Here, scales are derived by simply transforming a singular initial feature map.

Two additional strategies complement RSA:

Scale-Forecast Network: A proposed network module that predicts potential object scales within an image. This allows the system to focus computational resources on likely scales, avoiding unnecessary calculations for scales unlikely to contain objects.
Landmark Retracing Network (LRN): This network traces landmarks using RSA-based predictions to refine object proposals and reduce false positives. It provides a mechanism for enhancing the confidence scores of detected objects by leveraging landmark features.

Performance Evaluation

Experiments indicate that the algorithm performs strongly compared to state-of-the-art face detection methods and delivers competitive results in generating generic object proposals. RSA demonstrates efficiency, with results showing superior or comparable performance on benchmarks such as AFW and FDDB, with reduced computational load compared to traditional methods like multi-scale RPNs.

Computational Benefits

The RSA unit reduces the need for multi-shot detection operations across various scales, effectively compressing the computational load into a more concise set of processes without significantly sacrificing detection accuracy. The scale-forecast network further streamlines operations by predicting scale occurrences, while the LRN hones in on true positives, improving the detection's reliability.

Implications for Future Research

This work opens up possible advancements in several areas:

Efficiency in Large-Scale Object Detection: The RSA framework provides a pathway to more efficient detection methods, crucial for real-time applications and deployment on resource-constrained devices.
Adaptability to Other Vision Tasks: RSA's principles could be extended to other domains such as segmentation, where scale variance is also a concern.
Temporal and Spatial Efficiency: The model's principles could inspire more efficient representation handling in video sequences, potentially influencing video-based object detection formats.

In conclusion, the Recurrent Scale Approximation for Object Detection in CNN presents a promising avenue in the field of computer vision, offering both theoretical insights and practical benefits. Its ability to maintain competitive accuracy while reducing computation cost could drive efficiency standards in object detection methodologies, paving the way for adaptable and scalable solutions.

PDF Markdown

Related Papers

GitHub

GitHub - liuyuisanai/RSA-for-object-detection: Code and some data for 'Recurrent Scale Approximation for Object Detection in CNN' in ICCV 2017 (238 stars)