Deep Face Recognition: A Survey (1804.06655v9)

Published 18 Apr 2018 in cs.CV

Abstract: Deep learning applies multiple processing layers to learn representations of data with multiple levels of feature extraction. This emerging technique has reshaped the research landscape of face recognition (FR) since 2014, launched by the breakthroughs of DeepFace and DeepID. Since then, deep learning technique, characterized by the hierarchical architecture to stitch together pixels into invariant face representation, has dramatically improved the state-of-the-art performance and fostered successful real-world applications. In this survey, we provide a comprehensive review of the recent developments on deep FR, covering broad topics on algorithm designs, databases, protocols, and application scenes. First, we summarize different network architectures and loss functions proposed in the rapid evolution of the deep FR methods. Second, the related face processing methods are categorized into two classes: "one-to-many augmentation" and "many-to-one normalization". Then, we summarize and compare the commonly used databases for both model training and evaluation. Third, we review miscellaneous scenes in deep FR, such as cross-factor, heterogenous, multiple-media and industrial scenes. Finally, the technical challenges and several promising directions are highlighted.

Citations (1,129)

View on Semantic Scholar

Summary

The paper systematically reviews state-of-the-art deep learning methods for face recognition.
It analyzes CNN architectures, angular margin losses, and data augmentation techniques to boost performance.
The survey highlights challenges and future directions in security, bias mitigation, and robust FR system design.

Deep Face Recognition: A Survey

The paper "Deep Face Recognition: A Survey" by Mei Wang and Weihong Deng systematically examines the advancements in deep learning techniques as applied to face recognition (FR). This highly informative survey navigates through various dimensions of deep FR methodologies, providing researchers with a broad understanding of the field. Below, we provide a formal summary and analysis of the key aspects covered in the paper, emphasizing significant findings and potential future directions.

Introduction and Background

Face recognition has long been a vital biometric technique, with applications spanning military, finance, public security, and daily life. Early methods in FR, such as the historical Eigenface approach, primarily relied on holistic features. However, these methods struggled against complex variations in face appearances, like lighting, pose, and expression changes.

The paradigm shift occurred in 2012 when AlexNet's success in the ImageNet competition highlighted the potential of deep learning (DL). Supported by CNNs, DL techniques began to revolutionize FR, significantly improving performance metrics. With the advent of models like DeepFace and DeepID, the landscape of FR research has dramatically evolved.

Network Architectures and Loss Functions

Evolutionary Path of Network Architectures

The survey outlines the rapid development of network architectures:

Mainstream Architectures: Starting from AlexNet, advancements proceeded through VGGNet, GoogleNet (Inception), ResNet, and SENet. These architectures have been central to the progression in FR, leveraging deeper and more complex models to achieve superior performance.
Adaptive Architectures: This class includes approaches using NAS [88], [130], [131] and conditional CNNs [89] [90], showing innovations in dynamically generating network configurations to handle specific variations.
Assembled Networks: Techniques like multi-input networks and assembled task-specific networks have shown improvements by combining features extracted from different perspectives (e.g., pose variations).

Loss Functions for Feature Learning

The survey categorizes the loss functions crucial for training robust FR models:

Euclidean-Distance-Based Losses: Including contrastive loss and triplet loss, these are designed to pull together similar pairs and push apart dissimilar ones, though they sometimes suffer from the effective selection of training samples [35] [81].
Angular/Cosine-Margin-Based Losses: Such as L-softmax and A-softmax, these build on large-margin features for better discrimination by introducing angular margins on hyperspherical feature spaces [104] [84].
Variations of Softmax: These include L2-Softmax and normalization strategies that ensure features are separable and discriminative under varied real-world conditions [109] [115].

Data Handling and Training Protocols

Face Processing Methods

Face processing methods tackle the intrinsic visual variations in facial images:

One-to-Many Augmentation: Methods like data augmentation using GANs (e.g., DA-GAN) and 3D models synthesize diverse training images from single images to enhance robustness [70] [71] [55].
Many-to-One Normalization: Approaches such as using autoencoders or GAN-based frontalization techniques aim to standardize testing data against trained models, making comparisons invariant to variations in pose or lighting conditions [66] [69].

Training Datasets

The analysis of large-scale datasets underpinning modern FR reveals:

Publicly Available Training Sets: Sets like CASIA-WebFace, MS-Celeb-1M [45], and VGGFace2 [39] have laid the foundation for extensive model training. Emphasis is placed on the balance between data depth and breadth and addressing noise and bias in large datasets.
Noise and Bias Considerations: The survey discusses the impact of noisy labels on model performance and the demographic biases present in training datasets, calling for cleaner and more representative data collection methods.

Diverse Real-World Applications

The extension of FR into various real-world scenarios requires specialized approaches to handle:

Cross-Factor FR: Systems must handle diverse conditions like cross-pose (DREAM [215]), cross-age [226], and makeup variations [208].
Heterogeneous FR: This involves matching across different imaging modalities, such as VIS-NIR and photo-sketch pairs [196] [202].
Multiple/Single Media FR: Here, methods must effectively deal with data scarcity and cross-media comparisons, employing techniques like low-shot learning [137] and set-based recognition [83].
Industrial Applications: Challenges in industry, such as 3D FR and mobile device applications, require continuous advancements in lightweight models and robust matching algorithms [87] [204].

Future Directions and Challenges

The survey highlights several technical challenges and future directions for deep FR research:

Security and Privacy: Addressing vulnerabilities to spooﬁng, adversarial attacks, and ensuring privacy-preserving FR remains critical [289] [290].
Bias Mitigation: Ongoing efforts to reduce racial, gender, and age biases in FR systems [173].
Understanding Face Representations: Further research is needed to understand the "identity capacity" of deep representations and the root causes of adversarial vulnerabilities [300].

Conclusion

This survey by Wang and Deng comprehensively covers the dynamic field of deep face recognition, marking notable developments and identifying crucial technical frontiers. The extensive analysis of network architectures, data handling methods, and application scenarios enriches the understanding of FR methods, guiding future research towards more robust and fair recognition systems. With the landscape continuously evolving, it is imperative that ongoing innovations address the emerging challenges in fairness, security, and practical deployment.

PDF Markdown

Related Papers

YouTube

Show All Videos