Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RetinaFace: Single-stage Dense Face Localisation in the Wild (1905.00641v2)

Published 2 May 2019 in cs.CV

Abstract: Though tremendous strides have been made in uncontrolled face detection, accurate and efficient face localisation in the wild remains an open challenge. This paper presents a robust single-stage face detector, named RetinaFace, which performs pixel-wise face localisation on various scales of faces by taking advantages of joint extra-supervised and self-supervised multi-task learning. Specifically, We make contributions in the following five aspects: (1) We manually annotate five facial landmarks on the WIDER FACE dataset and observe significant improvement in hard face detection with the assistance of this extra supervision signal. (2) We further add a self-supervised mesh decoder branch for predicting a pixel-wise 3D shape face information in parallel with the existing supervised branches. (3) On the WIDER FACE hard test set, RetinaFace outperforms the state of the art average precision (AP) by 1.1% (achieving AP equal to 91.4%). (4) On the IJB-C test set, RetinaFace enables state of the art methods (ArcFace) to improve their results in face verification (TAR=89.59% for FAR=1e-6). (5) By employing light-weight backbone networks, RetinaFace can run real-time on a single CPU core for a VGA-resolution image. Extra annotations and code have been made available at: https://github.com/deepinsight/insightface/tree/master/RetinaFace.

Citations (539)

Summary

  • The paper introduces a single-stage method that uses enhanced five-point landmark annotations and a self-supervised mesh decoder for improved face detection.
  • It achieves a 91.4% average precision on challenging datasets and operates in real-time on lightweight hardware.
  • The research shares publicly available annotations and code, paving the way for further advancements in robust and efficient face localisation.

Overview of RetinaFace: Single-stage Dense Face Localisation in the Wild

The paper "RetinaFace: Single-stage Dense Face Localisation in the Wild" presents a robust approach for face detection and localisation using a single-stage method that leverages both extra-supervised and self-supervised multi-task learning. The proposed method, named RetinaFace, aims to address the persistent challenges in accurate face localisation across varying conditions and scales in natural settings.

Contributions and Methodology

RetinaFace introduces significant advancements through several key contributions:

  1. Enhanced Annotation: The authors manually annotated five facial landmarks on the WIDER FACE dataset, resulting in notable improvements in detecting hard-to-detect faces. This additional annotation serves as an extra supervision signal that enhances the performance of the model.
  2. Self-supervised Mesh Decoder: A self-supervised mesh decoder branch was added to predict pixel-wise 3D facial shapes in parallel with supervised branches. This approach enriches the model's ability to understand facial structures more comprehensively.
  3. Performance Metrics: RetinaFace demonstrates superior performance by outperforming state-of-the-art methods with an average precision (AP) of 91.4% on the WIDER FACE hard test set, and by improving the face verification accuracy (TAR = 89.59% at FAR = 1e-6) on the IJB-C dataset using ArcFace as the baseline.
  4. Real-time Capability: By employing lightweight backbone networks, RetinaFace operates in real-time on a single CPU core for VGA-resolution images, highlighting its practical applicability.
  5. Public Resources: The paper provides access to extra annotations and code, facilitating further research and development in the field.

The backbone of the RetinaFace model consists of a feature pyramid with additional context modules to enhance its detection accuracy across different face scales. The introduction of dense face localisation implies pixel-wise alignment, contributing to better spatial accuracy, especially on challenging datasets like WIDER FACE.

Implications and Future Directions

RetinaFace sets a new benchmark in face localisation, particularly in unrestrained environments, by integrating multiple learning signals in a single-stage framework. The enhancement in landmark detection and alignment serves as a pivotal advancement in improving face recognition systems that depend on accurate face localisation.

The practical implications of this research are vast, spanning applications in security, social media, and human-computer interaction where rapid and reliable face detection is crucial. The ability to execute in real-time on lightweight devices expands its applicability to mobile and edge computing scenarios.

Future research could explore the extension of the combined supervised and self-supervised framework to other object detection tasks. Additionally, further refinement of mesh decoding techniques may enhance the 3D understanding of complex facial expressions, increasing robustness against occlusions and extreme angles.

In summary, RetinaFace demonstrates a comprehensive and efficient solution for dense face localisation in challenging scenarios, marking a substantial contribution to the domain of computer vision. The availability of detailed annotations and code promises to spur further innovations and applications in related AI tasks.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com