FRoundation: Are Foundation Models Ready for Face Recognition? (2410.23831v3)
Abstract: Foundation models are predominantly trained in an unsupervised or self-supervised manner on highly diverse and large-scale datasets, making them broadly applicable to various downstream tasks. In this work, we investigate for the first time whether such models are suitable for the specific domain of face recognition (FR). We further propose and demonstrate the adaptation of these models for FR across different levels of data availability, including synthetic data. Extensive experiments are conducted on multiple foundation models and datasets of varying scales for training and fine-tuning, with evaluation on a wide range of benchmarks. Our results indicate that, despite their versatility, pre-trained foundation models tend to underperform in FR in comparison with similar architectures trained specifically for this task. However, fine-tuning foundation models yields promising results, often surpassing models trained from scratch, particularly when training data is limited. For example, after fine-tuning only on 1K identities, DINOv2 ViT-S achieved average verification accuracy on LFW, CALFW, CPLFW, CFP-FP, and AgeDB30 benchmarks of 87.10%, compared to 64.70% achieved by the same model and without fine-tuning. While training the same model architecture, ViT-S, from scratch on 1k identities reached 69.96%. With access to larger-scale FR training datasets, these performances reach 96.03% and 95.59% for the DINOv2 and CLIP ViT-L models, respectively. In comparison to the ViT-based architectures trained from scratch for FR, fine-tuned same architectures of foundation models achieve similar performance while requiring lower training computational costs and not relying on the assumption of extensive data availability. We further demonstrated the use of synthetic face data, showing improved performances over both pre-trained foundation and ViT models.
- Arcface: Additive angular margin loss for deep face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2022) 5962–5979. URL: http://dx.doi.org/10.1109/TPAMI.2021.3087709. doi:10.1109/tpami.2021.3087709.
- Elasticface: Elastic margin loss for deep face recognition, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, New Orleans, LA, USA, June 19-20, 2022, IEEE, 2022, pp. 1577–1586. URL: https://doi.org/10.1109/CVPRW56347.2022.00164. doi:10.1109/CVPRW56347.2022.00164.
- Transface: Calibrating transformer training for face recognition from a data-centric perspective, in: IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, IEEE, 2023, pp. 20585–20596. URL: https://doi.org/10.1109/ICCV51070.2023.01887. doi:10.1109/ICCV51070.2023.01887.
- Adapting vision foundation models for plant phenotyping, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 604–613.
- Synthetic data for face recognition: Current state and future prospects, Image Vis. Comput. 135 (2023) 104688. URL: https://doi.org/10.1016/j.imavis.2023.104688. doi:10.1016/J.IMAVIS.2023.104688.
- Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments, in: Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, Erik Learned-Miller and Andras Ferencz and Frédéric Jurie, Marseille, France, 2008. URL: https://inria.hal.science/inria-00321923.
- Frontal to profile face verification in the wild, in: 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, Institute of Electrical and Electronics Engineers Inc., 2016. doi:10.1109/WACV.2016.7477558, publisher Copyright: © 2016 IEEE.; IEEE Winter Conference on Applications of Computer Vision, WACV 2016 ; Conference date: 07-03-2016 Through 10-03-2016.
- Agedb: the first manually collected, in-the-wild age database, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, volume 2, 2017, p. 5.
- Cross-age LFW: A database for studying cross-age face recognition in unconstrained environments, CoRR abs/1708.08197 (2017). URL: http://arxiv.org/abs/1708.08197. arXiv:1708.08197.
- Iarpa janus benchmark-b face dataset, 2017, pp. 592–600. doi:10.1109/CVPRW.2017.87.
- IARPA janus benchmark - C: face dataset and protocol, in: 2018 International Conference on Biometrics, ICB 2018, Gold Coast, Australia, February 20-23, 2018, IEEE, 2018, pp. 158–165. URL: https://doi.org/10.1109/ICB2018.2018.00033. doi:10.1109/ICB2018.2018.00033.
- Learning face representation from scratch, CoRR abs/1411.7923 (2014). URL: http://arxiv.org/abs/1411.7923. arXiv:1411.7923.
- Ms-celeb-1m: A dataset and benchmark for large-scale face recognition, in: B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.), Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III, volume 9907 of Lecture Notes in Computer Science, Springer, 2016, pp. 87–102. URL: https://doi.org/10.1007/978-3-319-46487-9_6. doi:10.1007/978-3-319-46487-9\_6.
- Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Processing Letters 23 (2016) 1499–1503.
- Fixing the train-test resolution discrepancy, CoRR abs/1906.06423 (2019). URL: http://arxiv.org/abs/1906.06423. arXiv:1906.06423.
- Sface2: Synthetic-based face recognition with w-space identity-driven sampling, IEEE Trans. Biom. Behav. Identity Sci. 6 (2024) 290–303. URL: https://doi.org/10.1109/TBIOM.2024.3371502. doi:10.1109/TBIOM.2024.3371502.
- Randaugment: Practical automated data augmentation with a reduced search space, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, 2020, Computer Vision Foundation / IEEE, 2020, pp. 3008–3017. URL: https://openaccess.thecvf.com/content_CVPRW_2020/html/w40/Cubuk_Randaugment_Practical_Automated_Data_Augmentation_With_a_Reduced_Search_Space_CVPRW_2020_paper.html. doi:10.1109/CVPRW50498.2020.00359.
- Exfacegan: Exploring identity directions in gan’s learned latent space for synthetic identity generation, in: IEEE International Joint Conference on Biometrics, IJCB 2023, Ljubljana, Slovenia, September 25-28, 2023, IEEE, 2023, pp. 1–10. URL: https://doi.org/10.1109/IJCB57857.2023.10449036. doi:10.1109/IJCB57857.2023.10449036.
- Racial faces in the wild: Reducing racial bias by information maximization adaptation network, in: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, IEEE, 2019, pp. 692–702. URL: https://doi.org/10.1109/ICCV.2019.00078. doi:10.1109/ICCV.2019.00078.
- Meta balanced network for fair face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 44 (2022) 8433–8448. URL: https://doi.org/10.1109/TPAMI.2021.3103191. doi:10.1109/TPAMI.2021.3103191.
- M. Wang, W. Deng, Mitigate bias in face recognition using skewness-aware reinforcement learning, CoRR abs/1911.10692 (2019). URL: http://arxiv.org/abs/1911.10692. arXiv:1911.10692.