What is a Goldilocks Face Verification Test Set? (2405.15965v1)
Abstract: Face Recognition models are commonly trained with web-scraped datasets containing millions of images and evaluated on test sets emphasizing pose, age and mixed attributes. With train and test sets both assembled from web-scraped images, it is critical to ensure disjoint sets of identities between train and test sets. However, existing train and test sets have not considered this. Moreover, as accuracy levels become saturated, such as LFW $>99.8\%$, more challenging test sets are needed. We show that current train and test sets are generally not identity- or even image-disjoint, and that this results in an optimistic bias in the estimated accuracy. In addition, we show that identity-disjoint folds are important in the 10-fold cross-validation estimate of test accuracy. To better support continued advances in face recognition, we introduce two "Goldilocks" test sets, Hadrian and Eclipse. The former emphasizes challenging facial hairstyles and latter emphasizes challenging over- and under-exposure conditions. Images in both datasets are from a large, controlled-acquisition (not web-scraped) dataset, so they are identity- and image-disjoint with all popular training sets. Accuracy for these new test sets generally falls below that observed on LFW, CPLFW, CALFW, CFP-FP and AgeDB-30, showing that these datasets represent important dimensions for improvement of face recognition. The datasets are available at: \url{https://github.com/HaiyuWu/SOTA-Face-Recognition-Train-and-Test}
- Is face recognition sexist? no, gendered hairstyles and biology are. BMVC, 2020.
- Partial FC: training 10 million identities on a single machine. In ICCVW, pages 1445–1449, 2021. URL https://doi.org/10.1109/ICCVW54120.2021.00166.
- The gender gap in face recognition accuracy is a hairy problem. In WACVW, pages 303–312, 2023.
- Learning mappings for face synthesis from near infrared to visual light images. In CVPR, pages 156–163, 2009.
- Low-resolution face recognition. In ACCV, pages 605–621, 2019.
- Arcface: Additive angular margin loss for deep face recognition. In CVPR, pages 4690–4699, 2019a. URL http://openaccess.thecvf.com/content_CVPR_2019/html/Deng_ArcFace_Additive_Angular_Margin_Loss_for_Deep_Face_Recognition_CVPR_2019_paper.html.
- Lightweight face recognition challenge. In ICCVW, pages 2638–2646. IEEE, 2019b. URL https://doi.org/10.1109/ICCVW.2019.00322.
- Scface–surveillance cameras face database. Multimedia tools and applications, 51:863–879, 2011.
- Face synthesis for eyeglass-robust face recognition. In CCBR, pages 275–284, 2018.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- The buaa-visnir face database instructions. School Comput. Sci. Eng., Beihang Univ., Beijing, China, Tech. Rep. IRIP-TR-12-FR-001, 3(3):8, 2012.
- Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008.
- Curricularface: Adaptive curriculum learning loss for deep face recognition. In CVPR, pages 5900–5909, 2020. URL https://openaccess.thecvf.com/content_CVPR_2020/html/Huang_CurricularFace_Adaptive_Curriculum_Learning_Loss_for_Deep_Face_Recognition_CVPR_2020_paper.html.
- Demogpairs: Quantifying the impact of demographic imbalance in deep face recognition. In IEEE F&G, pages 1–7, 2019. URL https://doi.org/10.1109/FG.2019.8756625.
- Ijb–s: Iarpa janus surveillance video benchmark. In IEEE BTAS, pages 1–9, 2018.
- Domain adaptation for face recognition: Targetize source domain bridged by common subspace. IJCV, 109:94–109, 2014.
- Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In WACV, pages 1548–1558, 2021.
- Adaface: Quality adaptive margin for face recognition. In CVPR, pages 18729–18738, 2022. URL https://doi.org/10.1109/CVPR52688.2022.01819.
- Pushing the frontiers of unconstrained face detection and recognition: IARPA janus benchmark A. In CVPR, 2015. URL https://doi.org/10.1109/CVPR.2015.7298803.
- Cross-quality lfw: A database for analyzing cross-resolution image face recognition in unconstrained environments. In IEEE F&G, pages 1–5, 2021.
- The casia nir-vis 2.0 face database. In CVPRW, pages 348–353, 2013.
- Deep learning face attributes in the wild. In ICCV, pages 3730–3738, 2015.
- IARPA janus benchmark - C: face dataset and protocol. In International Conference on Biometrics, pages 158–165, 2018. URL https://doi.org/10.1109/ICB2018.2018.00033.
- Magface: A universal representation for face recognition and quality assessment. In CVPR, pages 14225–14234, 2021. URL https://openaccess.thecvf.com/content/CVPR2021/html/Meng_MagFace_A_Universal_Representation_for_Face_Recognition_and_Quality_Assessment_CVPR_2021_paper.html.
- Agedb: The first manually collected, in-the-wild age database. In CVPRW, pages 1997–2005, 2017. URL https://doi.org/10.1109/CVPRW.2017.250.
- Beard segmentation and recognition bias. arXiv preprint arXiv:2308.15740, 2023.
- Distinguishing identical twins by face recognition. In IEEE F&G, pages 185–192, 2011.
- A density based algorithm for discovering density varied clusters in large spatial databases. International Journal of Computer Applications, 3(6):1–4, 2010.
- Impact of doppelgängers on face recognition: Database and evaluation. In IEEE BIOSIG, pages 1–4, 2021.
- Morph: A longitudinal image database of normal adult age-progression. In IEEE F&G, pages 341–345, 2006.
- Face recognition: Too bias, or not too bias? In CVPRW, pages 1–10, 2020. URL https://openaccess.thecvf.com/content_CVPRW_2020/html/w1/Robinson_Face_Recognition_Too_Bias_or_Not_Too_Bias_CVPRW_2020_paper.html.
- Double trouble? impact and detection of duplicates in face image datasets. arXiv preprint arXiv:2401.14088, 2024.
- Frontal to profile face verification in the wild. In WACV, pages 1–9, 2016. URL https://doi.org/10.1109/WACV.2016.7477558.
- Unsupervised domain adaptation for face recognition in unlabeled videos. In ICCV, pages 3210–3218, 2017.
- A comprehensive study on face recognition biases beyond demographics. IEEE Transactions on Technology and Society, 3(1):16–30, 2021.
- Doppelver: A benchmark for face verification. In International Symposium on Visual Computing, pages 431–444. Springer, 2023.
- Mlfw: A database for face recognition on masked faces. arXiv preprint arXiv:2109.05804, 2021.
- IARPA janus benchmark-b face dataset. In CVPRW, pages 592–600, 2017. URL https://doi.org/10.1109/CVPRW.2017.87.
- What should be balanced in a” balanced” face recognition dataset. In BMVC, volume 1, page 2, 2023.
- Face recognition accuracy across demographics: Shining a light into the problem. In CVPRW, pages 1041–1050, 2023a.
- Logical consistency and greater descriptive power for facial hair attribute learning. In CVPR, pages 8588–8597, 2023b.
- Consistency and accuracy of celeba attribute values. In CVPRW, 2023c.
- Logicnet: A logical consistency embedded face attribute learning network. arXiv preprint arXiv:2311.11208, 2023d.
- Facial hair area in face recognition across demographics: Small size, big effect. In WACVW, pages 1131–1140, 2024.
- Cross-pose lfw: A database for studying cross-pose face recognition in unconstrained environments. Beijing University of Posts and Telecommunications, Tech. Rep, 5(7), 2018.
- Cross-age lfw: A database for studying cross-age face recognition in unconstrained environments. arXiv preprint arXiv:1708.08197, 2017.
- Towards transferable adversarial attack against deep face recognition. IEEE Transactions on Information Forensics and Security, 2020.
- Uniface: Unified cross-entropy loss for deep face recognition. In ICCV, pages 20730–20739, 2023.
- Webface260m: A benchmark for million-scale deep face recognition. PAMI, 45(2):2627–2644, 2023. URL https://doi.org/10.1109/TPAMI.2022.3169734.
- Haiyu Wu (22 papers)
- Sicong Tian (5 papers)
- Aman Bhatta (9 papers)
- Jacob Gutierrez (2 papers)
- Grace Bezold (6 papers)
- Genesis Argueta (1 paper)
- Karl Ricanek Jr. (1 paper)
- Michael C. King (17 papers)
- Kevin W. Bowyer (50 papers)