Papers
Topics
Authors
Recent
Search
2000 character limit reached

BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

Published 11 Jul 2019 in cs.CV | (1907.05047v2)

Abstract: We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression.

Citations (268)

Summary

  • The paper introduces BlazeFace, a neural network that achieves sub-millisecond face detection on mobile GPUs by leveraging a lightweight feature extractor and an optimized GPU-friendly anchor scheme.
  • The model demonstrates up to 1000+ FPS and 98.61% average precision on flagship devices, outperforming conventional methods like MobileNetV2-SSD.
  • BlazeFace supports real-time AR pipelines by delivering accurate facial keypoints and expression classification with enhanced stability via a tie resolution strategy.

BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

Introduction

The paper presents BlazeFace, a neural network model specifically designed for face detection on mobile GPUs. Unlike conventional detection frameworks, BlazeFace achieves remarkable inference speeds ranging from 200 to 1000+ FPS on high-end devices. This performance makes BlazeFace highly suitable for augmented reality (AR) pipelines where precise face detection is critical for tasks such as facial keypoint estimation, expression classification, and face region segmentation.

Architectural Innovations

BlazeFace introduces multiple architectural innovations:

  1. Lightweight Feature Extractor:
    • Inspired by MobileNetV1/V2, the feature extractor is reconstructed to be more suited for lightweight object detection tasks.
    • The concept of BlazeBlocks is introduced, which increases the depthwise convolution kernel to 5x5 (Figure 1). Figure 1

      Figure 1: BlazeBlock (left) and double BlazeBlock.

  2. GPU-friendly Anchor Scheme:
    • The anchor scheme is adapted from SSD but is modified to improve GPU computation efficiency.
    • Instead of varying aspect ratios at different resolutions, BlazeFace anchors at a single 8x8 resolution to optimize performance on GPU hardware (Figure 2). Figure 2

      Figure 2: Anchor computation: SSD (left) vs. BlazeFace.

  3. Tie Resolution Strategy:
    • BlazeFace employs a blending strategy that averages overlapping predictions, optimizing prediction stability over time and replacing traditional non-maximum suppression. This was demonstrated to enhance accuracy by 10% and reduce jitter by 30-40%.

Performance Evaluation

Through extensive evaluations performed on several flagship mobile devices, BlazeFace demonstrates significant advancements over existing models. On an Apple iPhone XS, this model performs inference at 0.6 ms, a substantial improvement over comparable models like MobileNetV2-SSD, which runs at an average of 2.1 ms. The average precision (AP) for BlazeFace reaches 98.61%, indicating superior detection capabilities while maintaining minimal computational overhead.

Applications in AR Pipelines

The effectiveness of BlazeFace was tested in downstream AR applications. By specializing in face detection, BlazeFace efficiently provides the bounding faces and keypoints, crucial for precise facial alignment in 2D/3D facial keypoint extraction and expression classification. The model enhances subsequent face-specific computations by delivering rotated and centered face crops, reducing the invariance requirement for task-specific models.

(Figure 3)

Figure 3: Application pipeline example demonstrating BlazeFace's output juxtaposed with subsequent task-specific processing.

This efficient pipeline is vital for real-time AR applications, as it ensures precise tracking capabilities with minimal latency.

Conclusion

BlazeFace represents a pivotal step toward real-time face detection on mobile platforms by leveraging optimizations tailored for GPU inference. Its high performance, achieved through innovative architectural adjustments, not only benefits AR applications but also extends to various other domains requiring rapid and reliable face detection. The model sets a new standard in balancing speed and accuracy, facilitating broader adoption in embedded systems and mobile AR solutions. Future developments could explore further architectural refinements and adapt BlazeFace to encompass additional face attributes, enhancing its applicability across a wider spectrum of applications.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.