Finding Tiny Faces (1612.04402v2)

Published 13 Dec 2016 in cs.CV

Abstract: Though tremendous strides have been made in object recognition, one of the remaining open challenges is detecting small objects. We explore three aspects of the problem in the context of finding small faces: the role of scale invariance, image resolution, and contextual reasoning. While most recognition approaches aim to be scale-invariant, the cues for recognizing a 3px tall face are fundamentally different than those for recognizing a 300px tall face. We take a different approach and train separate detectors for different scales. To maintain efficiency, detectors are trained in a multi-task fashion: they make use of features extracted from multiple layers of single (deep) feature hierarchy. While training detectors for large objects is straightforward, the crucial challenge remains training detectors for small objects. We show that context is crucial, and define templates that make use of massively-large receptive fields (where 99% of the template extends beyond the object of interest). Finally, we explore the role of scale in pre-trained deep networks, providing ways to extrapolate networks tuned for limited scales to rather extreme ranges. We demonstrate state-of-the-art results on massively-benchmarked face datasets (FDDB and WIDER FACE). In particular, when compared to prior art on WIDER FACE, our results reduce error by a factor of 2 (our models produce an AP of 82% while prior art ranges from 29-64%).

Citations (727)

View on Semantic Scholar

Summary

The paper introduces a multi-task, scale-specific framework that trains separate detectors for small and large faces.
It employs large receptive fields for contextual reasoning and integrates hierarchical features from deep networks.
Using image pyramids and extrapolated pre-trained models, the approach achieves an 82% average precision on challenging face benchmarks.

An Analytical Review of "Finding Tiny Faces"

Introduction

The paper "Finding Tiny Faces" by Peiyun Hu and Deva Ramanan addresses a key challenge in object detection: detecting small objects. The focus here is on small faces, with considerations on scale invariance, image resolution, and contextual reasoning. Despite substantial advancements in object recognition, the detection of small objects remains difficult due to the distinct cues required for recognizing small versus large objects. This research moves beyond scale-invariance, training separate detectors for different scales and utilizing a multi-task approach for feature extraction.

Key Contributions

1. Multi-task Modeling Across Scales:

The authors critique the conventional scale-invariant approach and propose a multi-task approach. This method entails training separate detectors for different scales, which are more adept at exploiting distinct features suitable for small and large objects. The multi-task approach ensures efficiency by using features extracted from multiple layers of a single deep feature hierarchy.

2. Contextual Modeling with Large Receptive Fields:

The paper highlights the importance of context for detecting small objects, implementing templates with massively-large receptive fields. Thus, even when small faces provide minimal signal, the extended context facilitates recognition.

3. Extrapolation of Pre-trained Networks:

Revising object detection for extreme scales is addressed by analyzing the pre-trained deep networks. The authors employ image interpolation and decimation, demonstrating that an image pyramid approach is particularly crucial for identifying small objects.

Methodology

1. Training Strategy:

Separate detectors are trained for different scales using a multi-task learning framework. This is based on features extracted from varying network layers. Such detectors are fine-tuned for different object sizes across a standardized image pyramid, balancing small and large templates.

2. Context Incorporation:

The concept of using a fixed-size receptive field across scale-specific templates is explored. By employing hierarchical features, the authors implement foveal descriptors, which capture fine local details and broader contextual information.

3. Scale-specific Detection:

Distinct from generalized region-proposal networks, this model uses scale-specific detectors tuned for interpolated images. This approach facilitated state-of-the-art results on face detection benchmarks FDDB and WIDER FACE.

Results

The paper presents empirical evidence supporting their claims of improved small object detection. The proposed model achieves an Average Precision (AP) of 82% on WIDER FACE, significantly surpassing prior models that range between 29-64%. This reduction of error by a factor of 2 illustrates the effectiveness of their approach.

On FDDB, the model demonstrates superior performance both in terms of discrete and continuous metrics, outperforming other methods in the literature.

Implications and Future Directions

Practical Implications:

This research has significant implications for real-world applications requiring small object detection, such as surveillance, medical imaging, and autonomous navigation.

Theoretical Implications:

The insights into context utilization and effective template scaling challenge conventional scale-invariant approaches and open avenues for further exploration in the domain of object detection.

Future Research:

Future work could investigate additional optimizations in context encoding and explore alternative multi-scale representation strategies. Research into adaptive feature extraction methods leveraging dynamic receptive fields could further enhance detection performance.

Conclusion

"Finding Tiny Faces" by Hu and Ramanan provides notable advancements in the detection of small objects. By employing scale-specific detectors, leveraging large receptive fields for context, and adapting pre-trained networks, the paper achieves significant improvements over existing methodologies. The rigorous experimentation and empirical validation reinforce the novel contributions of this work, establishing a robust framework for future research and practical implementations in small object detection.

PDF Markdown

Related Papers

YouTube

Show All Videos