LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition (2403.08161v1)
Abstract: In this work we focus on learning facial representations that can be adapted to train effective face recognition models, particularly in the absence of labels. Firstly, compared with existing labelled face datasets, a vastly larger magnitude of unlabeled faces exists in the real world. We explore the learning strategy of these unlabeled facial images through self-supervised pretraining to transfer generalized face recognition performance. Moreover, motivated by one recent finding, that is, the face saliency area is critical for face recognition, in contrast to utilizing random cropped blocks of images for constructing augmentations in pretraining, we utilize patches localized by extracted facial landmarks. This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition. We also incorporate two landmark-specific augmentations which introduce more diversity of landmark information to further regularize the learning. With learned landmark-based facial representations, we further adapt the representation for face recognition with regularization mitigating variations in landmark positions. Our method achieves significant improvement over the state-of-the-art on multiple face recognition benchmarks, especially on more challenging few-shot scenarios.
- Killing two birds with one stone: Efficient and robust training of face recognition cnns by partial fc. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4042–4051, 2022.
- Masked siamese networks for label-efficient learning. In European Conference on Computer Vision, pages 456–473. Springer, 2022.
- Human pose estimation via convolutional part heatmap regression. In ECCV, 2016.
- Pre-training strategies and datasets for facial representation learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIII, pages 107–125. Springer, 2022.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
- Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3025–3032, 2013.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Transface: Calibrating transformer training for face recognition from a data-centric perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20642–20653, 2023.
- Arcface: Additive angular margin loss for deep face recognition. In CVPR, 2019a.
- ArcFace: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4690–4699, 2019b.
- Lightweight face recognition challenge. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019c.
- Sub-center arcface: Boosting face recognition by large-scale noisy web faces. In European Conference on Computer Vision, pages 741–757. Springer, 2020a.
- Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5203–5212, 2020b.
- Masked face recognition challenge: The insightface track report. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1437–1444, 2021a.
- Variational prototype learning for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11906–11915, 2021b.
- Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE transactions on pattern analysis and machine intelligence, 40(4):1002–1014, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Adaptive soft contrastive learning. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 2721–2727. IEEE, 2022.
- Maskcon: Masked contrastive learning for coarse-labelled dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19913–19922, 2023.
- SSR: An efficient and robust framework for learning with unknown label noise. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022. BMVA Press, 2022.
- Self-supervised representation learning with cross-context learning between global and hypercolumn features. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1773–1783, 2024.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In European conference on computer vision, pages 87–102. Springer, 2016.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
- Few-shot learning for face recognition in the presence of image discrepancies for limited multi-class datasets. Image and Vision Computing, 120:104420, 2022.
- Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314–1324, 2019.
- Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008.
- Spatial transformer networks. Advances in neural information processing systems, 28:2017–2025, 2015.
- Pairwise relational networks for face recognition. In Proceedings of the European Conference on Computer Vision (ECCV), pages 628–645, 2018.
- Hierarchical feature-pair relation networks for face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
- Training generative adversarial networks with limited data. Advances in neural information processing systems, 33:12104–12114, 2020.
- The megaface benchmark: 1 million faces for recognition at scale. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4873–4882, 2016.
- Adaface: Quality adaptive margin for face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18750–18759, 2022.
- Pengyu Li. BioNet: A biologically-inspired network for face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10344–10354, 2023.
- Virtual fully-connected layer: Training a large-scale face recognition dataset with limited computational resources. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13315–13324, 2021.
- Unitsface: Unified threshold integrated sample-to-sample loss for face recognition. arXiv preprint arXiv:2311.02523, 2023.
- Targeting ultimate accuracy: Face recognition via deep embedding. arXiv preprint arXiv:1506.07310, 2015.
- Large-margin softmax loss for convolutional neural networks. In ICML, page 7, 2016.
- Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 212–220, 2017.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Self-supervision can be a good few-shot learner. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIX, pages 740–758. Springer, 2022.
- Iarpa janus benchmark-c: Face dataset and protocol. In 2018 International Conference on Biometrics (ICB), pages 158–165. IEEE, 2018.
- Magface: A universal representation for face recognition and quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14225–14234, 2021.
- Agedb: the first manually collected, in-the-wild age database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 51–59, 2017.
- Intriguing properties of vision transformers. Advances in Neural Information Processing Systems, 34:23296–23308, 2021.
- A quality aware sample-to-sample comparison for face recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6129–6138, 2023.
- Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015.
- Frontal to profile face verification in the wild. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–9. IEEE, 2016.
- How to boost face recognition with stylegan? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20924–20934, 2023.
- Layer-specific adaptive learning rates for deep networks. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pages 364–368. IEEE, 2015.
- How to train your vit? data, augmentation, and regularization in vision transformers. arXiv preprint arXiv:2106.10270, 2021.
- Part-based face recognition with vision transformers. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022. BMVA Press, 2022.
- Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
- Patchnet: A simple face anti-spoofing framework via fine-grained patch recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20281–20290, 2022a.
- Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5265–5274, 2018.
- FaceMAE: Privacy-preserving face recognition via masked autoencoders. arXiv preprint arXiv:2205.11090, 2022b.
- Hierarchical pyramid diverse attention networks for face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8326–8335, 2020.
- A discriminative feature learning approach for deep face recognition. In European conference on computer vision, pages 499–515. Springer, 2016.
- SphereFace2: Binary classification is all you need for deep face recognition. arXiv preprint arXiv:2108.01513, 2021.
- Iarpa janus benchmark-b face dataset. In proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 90–98, 2017.
- Comparator networks. In Proceedings of the European conference on computer vision (ECCV), pages 782–797, 2018.
- SimMIM: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9653–9663, 2022.
- Fan-face: a simple orthogonal improvement to deep face recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12621–12628, 2020.
- Two-stream prototype learning network for few-shot face recognition under occlusions. IEEE Transactions on Multimedia, 25:1555–1563, 2023.
- Discriminative multi-scale sparse coding for single-sample face recognition with occlusion. Pattern Recognition, 66:302–312, 2017.
- A novel approach inspired by optic nerve characteristics for few-shot occluded face recognition. Neurocomputing, 376:25–41, 2020.
- Face transformer for recognition. arXiv preprint arXiv:2103.14803, 2021.
- Webface260m: A benchmark unveiling the power of million-scale deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10492–10502, 2021.