Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

YOLO-FaceV2: A Scale and Occlusion Aware Face Detector (2208.02019v2)

Published 3 Aug 2022 in cs.CV

Abstract: In recent years, face detection algorithms based on deep learning have made great progress. These algorithms can be generally divided into two categories, i.e. two-stage detector like Faster R-CNN and one-stage detector like YOLO. Because of the better balance between accuracy and speed, one-stage detectors have been widely used in many applications. In this paper, we propose a real-time face detector based on the one-stage detector YOLOv5, named YOLO-FaceV2. We design a Receptive Field Enhancement module called RFE to enhance receptive field of small face, and use NWD Loss to make up for the sensitivity of IoU to the location deviation of tiny objects. For face occlusion, we present an attention module named SEAM and introduce Repulsion Loss to solve it. Moreover, we use a weight function Slide to solve the imbalance between easy and hard samples and use the information of the effective receptive field to design the anchor. The experimental results on WiderFace dataset show that our face detector outperforms YOLO and its variants can be find in all easy, medium and hard subsets. Source code in https://github.com/Krasjet-Yu/YOLO-FaceV2

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ziping Yu (2 papers)
  2. Hongbo Huang (3 papers)
  3. Weijun Chen (4 papers)
  4. Yongxin Su (3 papers)
  5. Yahui Liu (40 papers)
  6. Xiuying Wang (11 papers)
Citations (73)

Summary

  • The paper introduces novel modules including RFE and SEAM that enhance detection of faces across scales and occlusion scenarios.
  • The approach leverages dynamic Slide Loss and NWD Loss to balance hard and easy samples, significantly boosting detection accuracy.
  • Experiments on the WiderFace dataset demonstrate superior performance over YOLOv5, especially on challenging samples.

Overview of YOLO-FaceV2: A Scale and Occlusion Aware Face Detector

The paper "YOLO-FaceV2: A Scale and Occlusion Aware Face Detector" presents an advanced face detection method that addresses challenges associated with scale variance, occlusion, and the imbalance between easy and hard samples. Building upon the architecture of YOLOv5, the authors propose several novel modules to enhance the detector's performance, thus achieving a superior balance between accuracy and processing speed.

YOLO-FaceV2 integrates a Receptive Field Enhancement (RFE) module, which leverages dilated convolutions to expand the effective receptive field, facilitating the detection of faces of varying scales. This approach enriches the feature map's representational capacity, enabling enhanced multi-scale fusion. Consequently, the scale-aware capabilities of YOLO-FaceV2 are bolstered, improving its proficiency in detecting small faces—a task notoriously problematic in face detection.

Addressing occlusion, the authors incorporate a Separated and Enhancement Attention Module (SEAM). This module employs attention mechanisms to allocate greater weight to unobstructed face regions, thus mitigating the adverse effects of occlusion by enhancing feature extraction. Furthermore, the Repulsion Loss function is introduced to handle intra-class occlusions by encouraging predicted bounding boxes to steer clear of other ground-truth boxes, thereby improving the robustness of Non-Maximum Suppression (NMS).

The imbalance between easy and hard samples is another focal point. The Slide Loss weighting function is designed to dynamically adjust the emphasis on hard samples during training, ensuring that the model does not overfit on the abundant easy samples. This adaptive weighting is grounded in the IoU distribution, automatically forming a threshold to distinguish sample difficulty and adjust training influences accordingly.

For efficient anchor box design, YOLO-FaceV2 is informed by the concept of effective receptive fields. The authors refine anchor ratios and sizes to match effective receptive fields, thus enhancing the model's bounding box regression capabilities. Additionally, the Normalized Wasserstein Distance (NWD) Loss is integrated into the regression loss function to address limitations of IoU, particularly for small face detection, achieving a balance between large-scale and small-scale face detection performance.

The experimental results on the WiderFace dataset affirm the efficacy of YOLO-FaceV2. The model consistently outperformed YOLOv5 and its other variants by notable margins across all dataset subsets. Specifically, the improvements in the challenging 'hard' subset showcase the model's enhanced ability to navigate scale and occlusion challenges.

In conclusion, YOLO-FaceV2 demonstrates significant advancements in face detection by innovatively addressing scale variance, occlusion, and sample imbalance. The proposed methodologies offer both practical and theoretical implications, suggesting pathways for refining real-time face detection applications. Future directions may include exploring further refinements in loss functions and attention mechanisms to push the boundaries of detection accuracy and efficiency in even more complex datasets or real-world scenarios. The open-source release of this model stands to benefit further research in face detection and related areas.

Github Logo Streamline Icon: https://streamlinehq.com