Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FaceBoxes: A CPU Real-time Face Detector with High Accuracy (1708.05234v4)

Published 17 Aug 2017 in cs.CV

Abstract: Although tremendous strides have been made in face detection, one of the remaining open challenges is to achieve real-time speed on the CPU as well as maintain high performance, since effective models for face detection tend to be computationally prohibitive. To address this challenge, we propose a novel face detector, named FaceBoxes, with superior performance on both speed and accuracy. Specifically, our method has a lightweight yet powerful network structure that consists of the Rapidly Digested Convolutional Layers (RDCL) and the Multiple Scale Convolutional Layers (MSCL). The RDCL is designed to enable FaceBoxes to achieve real-time speed on the CPU. The MSCL aims at enriching the receptive fields and discretizing anchors over different layers to handle faces of various scales. Besides, we propose a new anchor densification strategy to make different types of anchors have the same density on the image, which significantly improves the recall rate of small faces. As a consequence, the proposed detector runs at 20 FPS on a single CPU core and 125 FPS using a GPU for VGA-resolution images. Moreover, the speed of FaceBoxes is invariant to the number of faces. We comprehensively evaluate this method and present state-of-the-art detection performance on several face detection benchmark datasets, including the AFW, PASCAL face, and FDDB. Code is available at https://github.com/sfzhang15/FaceBoxes

Citations (256)

Summary

  • The paper introduces FaceBoxes, a framework that balances real-time CPU processing with high detection accuracy (96.0% mAP on FDDB).
  • It employs Rapidly Digested Convolutional Layers for efficient computation and Multiple Scale Convolutional Layers to detect faces at varying scales.
  • The anchor densification strategy enhances small face recall, paving the way for lightweight network designs in real-time applications.

FaceBoxes: A CPU Real-time Face Detector with High Accuracy

The paper presents a face detection framework named FaceBoxes, which achieves a balance between real-time processing speeds on CPU devices and maintaining high detection accuracy. The authors address the persistent challenge in face detection: creating models that are both computationally efficient and effective in processing unconstrained images with various scales of faces.

Methodology

The FaceBoxes architecture is characterized by two main components: Rapidly Digested Convolutional Layers (RDCL) and Multiple Scale Convolutional Layers (MSCL).

  • RDCL: This component is optimized to ensure rapid computation by employing a reduced number of output channels and by shrinking the input spatial size swiftly. The use of C.ReLU activation function aids in doubling the number of output channels efficiently, enhancing speed without a significant loss in accuracy.
  • MSCL: To address the variability in face scales, MSCL utilizes a multi-scale mechanism both in network depth and width. The incorporation of the Inception module broadens the range of receptive fields, enabling the network to handle the detection of faces of varying dimensions effectively.

Additionally, the paper introduces an anchor densification strategy, enhancing the recall rate of small faces by ensuring uniform anchor density across all face scales. This is particularly significant for addressing the challenge of detecting smaller faces.

Experimental Results

Evaluated on prominent benchmark datasets—AFW, PASCAL Face, and FDDB—FaceBoxes demonstrates state-of-the-art performance. On the FDDB dataset, it boasts a mean Average Precision (mAP) of 96.0%. This mAP translates to superior detection capabilities compared to contemporaries while maintaining an efficient runtime of 20 frames per second (FPS) on a CPU and 125 FPS on a GPU for VGA-resolution images.

Implications and Future Work

The significant contribution of FaceBoxes is its ability to combine rapid processing speeds with high accuracy, especially beneficial for real-time applications on resource-constrained devices. This achievement challenges the traditional trade-off between computational efficiency and performance in face detection tasks.

Looking forward, FaceBoxes sets a foundation for further research in lightweight yet powerful network architectures. The modular nature of the design implicates potential for adaptation beyond face detection, potentially influencing other areas requiring real-time image processing. Future developments could explore integrating FaceBoxes with emerging technologies such as edge computing platforms, where computation power is limited, necessitating efficient yet robust neural networks.

Overall, FaceBoxes contributes a viable and impactful approach to the ongoing advancement of face detection technology, underscoring a significant step towards practical application deployment in diverse computational environments.

Github Logo Streamline Icon: https://streamlinehq.com