Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

S$^3$FD: Single Shot Scale-invariant Face Detector (1708.05237v3)

Published 17 Aug 2017 in cs.CV

Abstract: This paper presents a real-time face detector, named Single Shot Scale-invariant Face Detector (S$3$FD), which performs superiorly on various scales of faces with a single deep neural network, especially for small faces. Specifically, we try to solve the common problem that anchor-based detectors deteriorate dramatically as the objects become smaller. We make contributions in the following three aspects: 1) proposing a scale-equitable face detection framework to handle different scales of faces well. We tile anchors on a wide range of layers to ensure that all scales of faces have enough features for detection. Besides, we design anchor scales based on the effective receptive field and a proposed equal proportion interval principle; 2) improving the recall rate of small faces by a scale compensation anchor matching strategy; 3) reducing the false positive rate of small faces via a max-out background label. As a consequence, our method achieves state-of-the-art detection performance on all the common face detection benchmarks, including the AFW, PASCAL face, FDDB and WIDER FACE datasets, and can run at 36 FPS on a Nvidia Titan X (Pascal) for VGA-resolution images.

Citations (579)

Summary

  • The paper introduces a novel scale-equitable framework that leverages multi-stride anchor layers to effectively capture faces across diverse scales.
  • The paper implements a two-stage scale compensation anchor matching strategy that significantly boosts detection recall for small faces.
  • The paper achieves state-of-the-art mAP scores on benchmarks like WIDER FACE while maintaining real-time performance at 36 FPS.

Single Shot Scale-invariant Face Detector (S3^3FD)

The paper "Single Shot Scale-invariant Face Detector (S3^3FD)" presents an innovative approach to face detection leveraging a single deep neural network, particularly focused on small faces, a significant challenge in anchor-based detectors. The proposed S3^3FD integrates several strategic methodologies to address scalability, accuracy, and efficiency in face detection.

Key Contributions

  1. Scale-equitable Detection Framework:
    • The S3^3FD introduces a scale-equitable framework employing multiple anchor-associated layers. These layers vary in stride size from 4 to 128 pixels, ensuring that diverse face scales have sufficient feature representation. Anchors are meticulously designed to align with effective receptive fields, utilizing an equal proportion interval principle to maintain uniform anchor density across image scales.
  2. Scale Compensation Anchor Matching:
    • To enhance recall rates for small faces, a two-stage scale compensation anchor matching strategy is implemented. This strategy adjusts the matching threshold to ensure that faces, irrespective of size, sufficiently correspond to the anchor scales, addressing the discrete nature of anchor scales versus continuous face scales.
  3. Max-out Background Label:
    • The method mitigates false positives from small face detections by employing a max-out background label, particularly in the conv3_33\_3 detection layer. This technique reduces the impact of excessive negative anchors in small face detection, refining classifier accuracy.

Performance Evaluation

The S3^3FD showcases state-of-the-art performance across multiple face detection benchmarks, including AFW, PASCAL face, FDDB, and WIDER FACE datasets. For instance, on the WIDER FACE validation set, the model exhibits mean Average Precision (mAP) values of 93.7%, 92.4%, and 85.2% for easy, medium, and hard subsets, respectively. These results affirm the model's robustness and superior detection capability, especially with small faces.

Implications and Future Directions

The significantly improved performance of S3^3FD on hard subsets, characterized by small and occluded faces, emphasizes its potential for applications requiring real-time and accurate face detection in unconstrained environments. The model operates at 36 FPS on VGA-resolution images with a Nvidia Titan X, highlighting its practicality for real-world deployment.

Future developments could further refine classification strategies for background patches or integrate faster base networks to enhance processing speed. Moreover, exploring categorical distinctions within background patches could yield deeper insights into face detection dynamics.

In summary, the S3^3FD represents a crucial advancement in scale-invariant face detection, offering a blend of innovative architecture, strategic anchor matching, and refined classification methods to tackle the complexities of face detection across varying scales.