LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition (2403.08161v1)

Published 13 Mar 2024 in cs.CV and cs.AI

Abstract: In this work we focus on learning facial representations that can be adapted to train effective face recognition models, particularly in the absence of labels. Firstly, compared with existing labelled face datasets, a vastly larger magnitude of unlabeled faces exists in the real world. We explore the learning strategy of these unlabeled facial images through self-supervised pretraining to transfer generalized face recognition performance. Moreover, motivated by one recent finding, that is, the face saliency area is critical for face recognition, in contrast to utilizing random cropped blocks of images for constructing augmentations in pretraining, we utilize patches localized by extracted facial landmarks. This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition. We also incorporate two landmark-specific augmentations which introduce more diversity of landmark information to further regularize the learning. With learned landmark-based facial representations, we further adapt the representation for face recognition with regularization mitigating variations in landmark positions. Our method achieves significant improvement over the state-of-the-art on multiple face recognition benchmarks, especially on more challenging few-shot scenarios.

References (71)

Citations (1)

View on Semantic Scholar

Summary

The paper presents LAFS, a novel framework that leverages landmark-based patch extraction and specialized augmentations to enhance self-supervised face recognition.
It demonstrates significant performance gains, achieving a TAR@FAR=1e-4 of 38.05 in a 1-shot learning scenario on IJB-B compared to conventional methods.
The method effectively harnesses unlabeled data to reduce dependency on extensive annotations, highlighting its promise for scalable facial recognition applications.

Landmark-based Facial Self-supervised Learning for Face Recognition

The paper, "LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition," authored by Zhonglin Sun, Chen Feng, Ioannis Patras, and Georgios Tzimiropoulos, presents a novel approach to face recognition using self-supervised learning (SSL). This paper primarily addresses the adaptable learning of facial representations useful in face recognition tasks, particularly without relying on labeled data.

Introduction

The authors highlight a critical challenge in face recognition: the vast availability of unlabeled facial images compared to labeled datasets. Traditionally, labeled data is crucial for supervised learning models. However, SSL offers a pathway to leverage unlabeled data, which is abundant and often more diverse. The paper explores the efficacy of SSL for face recognition, especially using a landmark-based method named LAndmark-based Facial Self-supervised learning (LAFS).

Methodology

The methodology section outlines the proposed LAFS framework. Key innovations include:

Landmark-based Patches: Unlike traditional methods that use random cropped blocks for augmentations, LAFS localizes patches based on extracted facial landmarks. This leverages the saliency of specific facial regions crucial for face recognition.
Two Landmark-specific Augmentations: The paper introduces augmentations—Landmark Shuffle and Landmark Coordinate Perturbation—to regularize the learning process. These augmentations enhance the diversity of landmark information, reinforcing the model's ability to generalize.
Representation Adaptation: LAFS incorporates a strategy for adapting facial representations to mitigate variations in landmark positions, ensuring consistent performance across different conditions.

To ensure the effectiveness of these innovations, the method involves pretraining on a large-scale unlabeled dataset followed by fine-tuning on labeled datasets.

Experimental Evaluation

The experimental setup is rigorously designed to test the proposed methods against state-of-the-art techniques. Key datasets include WebFace42M, MS1MV3, LFW, CFP-FP, AgeDB, IJB-B, and IJB-C, among others.

Key Findings and Numerical Results

Few-shot Learning: The proposed LAFS framework demonstrates significant improvements in few-shot face recognition scenarios. For instance, when pretraining on 1 million images (1-shot) and fine-tuning on few-shot datasets, LAFS outperforms existing methods. On IJB-B at 1\% data with 1-shot setting, LAFS achieves a TAR@FAR=1e-4 of 38.05, which dramatically improves over traditional methods like ResNet (14.13) and fViT (1.67).
Data Efficiency: The self-supervised pretraining on 1-shot dataset substantially benefits the models, even when scaled to large data (e.g., 10\% of WebFace42M).
Augmentation Effectiveness: Landmark-based augmentations introduce meaningful perturbations that benefit the overall model performance. For example, adding Landmark Shuffle brings a 2-3\% accuracy increase in face recognition tasks.

Implications and Future Directions

Practically, the LAFS method allows leveraging vast unlabeled datasets for effective face recognition, which can reduce the dependency on large annotated datasets, mitigating privacy concerns and reducing costs. Theoretically, the landmark-based approach provides insights into how saliency regions impact model learning, suggesting possible extensions to other domains requiring localized information emphasis.

For future research, further investigation could explore refining landmark detection during the SSL training phase and extending the approach to other face-related tasks such as emotion recognition or facial expression analysis.

Conclusion

The LAFS framework presents a compelling advancement in self-supervised learning for face recognition, integrating landmark-based strategies to harness unlabeled data efficiently. This paper's findings emphasize the potential of SSL in achieving high-accuracy face recognition with limited labeled data, pushing the boundaries of current face recognition technology.

PDF Markdown