- The paper introduces FaceBoxes, a framework that balances real-time CPU processing with high detection accuracy (96.0% mAP on FDDB).
- It employs Rapidly Digested Convolutional Layers for efficient computation and Multiple Scale Convolutional Layers to detect faces at varying scales.
- The anchor densification strategy enhances small face recall, paving the way for lightweight network designs in real-time applications.
FaceBoxes: A CPU Real-time Face Detector with High Accuracy
The paper presents a face detection framework named FaceBoxes, which achieves a balance between real-time processing speeds on CPU devices and maintaining high detection accuracy. The authors address the persistent challenge in face detection: creating models that are both computationally efficient and effective in processing unconstrained images with various scales of faces.
Methodology
The FaceBoxes architecture is characterized by two main components: Rapidly Digested Convolutional Layers (RDCL) and Multiple Scale Convolutional Layers (MSCL).
- RDCL: This component is optimized to ensure rapid computation by employing a reduced number of output channels and by shrinking the input spatial size swiftly. The use of C.ReLU activation function aids in doubling the number of output channels efficiently, enhancing speed without a significant loss in accuracy.
- MSCL: To address the variability in face scales, MSCL utilizes a multi-scale mechanism both in network depth and width. The incorporation of the Inception module broadens the range of receptive fields, enabling the network to handle the detection of faces of varying dimensions effectively.
Additionally, the paper introduces an anchor densification strategy, enhancing the recall rate of small faces by ensuring uniform anchor density across all face scales. This is particularly significant for addressing the challenge of detecting smaller faces.
Experimental Results
Evaluated on prominent benchmark datasets—AFW, PASCAL Face, and FDDB—FaceBoxes demonstrates state-of-the-art performance. On the FDDB dataset, it boasts a mean Average Precision (mAP) of 96.0%. This mAP translates to superior detection capabilities compared to contemporaries while maintaining an efficient runtime of 20 frames per second (FPS) on a CPU and 125 FPS on a GPU for VGA-resolution images.
Implications and Future Work
The significant contribution of FaceBoxes is its ability to combine rapid processing speeds with high accuracy, especially beneficial for real-time applications on resource-constrained devices. This achievement challenges the traditional trade-off between computational efficiency and performance in face detection tasks.
Looking forward, FaceBoxes sets a foundation for further research in lightweight yet powerful network architectures. The modular nature of the design implicates potential for adaptation beyond face detection, potentially influencing other areas requiring real-time image processing. Future developments could explore integrating FaceBoxes with emerging technologies such as edge computing platforms, where computation power is limited, necessitating efficient yet robust neural networks.
Overall, FaceBoxes contributes a viable and impactful approach to the ongoing advancement of face detection technology, underscoring a significant step towards practical application deployment in diverse computational environments.