WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition (2103.04098v1)

Published 6 Mar 2021 in cs.CV

Abstract: In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol. Firstly, we collect 4M name list and download 260M faces from the Internet. Then, a Cleaning Automatically utilizing Self-Training (CAST) pipeline is devised to purify the tremendous WebFace260M, which is efficient and scalable. To the best of our knowledge, the cleaned WebFace42M is the largest public face recognition training set and we expect to close the data gap between academia and industry. Referring to practical scenarios, Face Recognition Under Inference Time conStraint (FRUITS) protocol and a test set are constructed to comprehensively evaluate face matchers. Equipped with this benchmark, we delve into million-scale face recognition problems. A distributed framework is developed to train face recognition models efficiently without tampering with the performance. Empowered by WebFace42M, we reduce relative 40% failure rate on the challenging IJB-C set, and ranks the 3rd among 430 entries on NIST-FRVT. Even 10% data (WebFace4M) shows superior performance compared with public training set. Furthermore, comprehensive baselines are established on our rich-attribute test set under FRUITS-100ms/500ms/1000ms protocol, including MobileNet, EfficientNet, AttentionNet, ResNet, SENet, ResNeXt and RegNet families. Benchmark website is https://www.face-benchmark.org.

Authors (11)

Zheng Zhu (200 papers)
Guan Huang (75 papers)
Jiankang Deng (96 papers)
Yun Ye (23 papers)
Junjie Huang (73 papers)
Xinze Chen (10 papers)
Jiagang Zhu (14 papers)
Tian Yang (46 papers)
Jiwen Lu (192 papers)
Dalong Du (32 papers)
Jie Zhou (687 papers)

Citations (218)

View on Semantic Scholar

Summary

The paper presents a novel benchmark that uses 260 million web-sourced images and an iterative cleaning process (CAST) to form high-quality datasets.
The paper introduces the FRUITS protocol that evaluates face recognition models under real-time inference constraints with fixed time limits.
The paper shows that models trained on the refined WebFace42M achieve up to a 40% error rate reduction on the IJB-C benchmark and top performance in NIST-FRVT.

An Analysis of WebFace260M: Evaluating Million-Scale Deep Face Recognition

The paper presents WebFace260M, a novel large-scale benchmark for deep face recognition, which aims to bridge the data gap between academia and industry. The benchmark comprises two significant datasets: a noisy set, WebFace260M, with 4 million identities and 260 million faces, and a refined dataset, WebFace42M, containing 2 million identities and 42 million faces. Through this extensive collection of web-sourced face data, the paper seeks to provide a robust platform for studying deep learning models in the context of face recognition under practical constraints.

Dataset Construction and Cleaning Methodology

The authors employed a sophisticated data collection and cleaning process, named Cleaning Automatically by Self-Training (CAST), which automatically curates high-quality data from a vast and noisy dataset. Initially, a name list comprising 4 million celebrities was compiled from Freebase and IMDb resources. Using search engines, 260 million face images were downloaded. The CAST mechanism utilizes self-training, enabling the creation of cleaner data by iteratively training and refining models to distinguish genuine data from noisy samples. Over three iterations of intra-class and inter-class cleaning using techniques such as DBSCAN clustering and cosine similarity-based filtering, the authors refined the dataset to WebFace42M with a drastically reduced noise level.

Evaluation Protocol and Testing

For the evaluation of face recognition systems, the paper introduces the Face Recognition Under Inference Time conStraint (FRUITS) protocol that places explicit emphasis on time-constrained scenarios. The FRUITS protocol is notable for its alignment with real-world applications, incorporating constraints such as 100, 500, and 1000 milliseconds. This approach allows for a more practical assessment of face recognition models across different computational environments, addressable with various architectural implementations from lightweight to heavy-duty deep learning models.

To complement this protocol, the authors curated a new test set enriched with attributes like age, gender, and race, offering richer insights into model biases and performance across demographic variations. Results from the FRUITS evaluation suggest considerable potential for improvement, particularly in systems working under severe computational constraints.

Results and Key Findings

The dataset and protocol provided fertile ground for comprehensive experiments where numerous face recognition models, including prominent architectures like ResNet and EfficientNet, were thoroughly tested. Remarkably, models trained on WebFace42M demonstrated significant advancements over those trained on existing public datasets, achieving a 40% reduction in error rates on the challenging IJB-C benchmark and ranking third among 430 entries in the NIST-FRVT test.

This research underscores the critical role of large-scale and cleaned datasets for enhancing face recognition model performance. Despite their contribution, the authors acknowledge potential sources of bias within the dataset, such as imbalances across different attributes and demographics. Future work could involve the exploration of methods for bias mitigation and the application of the WebFace260M dataset to expand other face-related challenges.

Implications and Future Directions

WebFace260M and its associated evaluation procedures represent a significant contribution to the domain, offering valuable resources for academic research that was formerly hamstrung by limited access to high-quality, large-scale face datasets. The paper suggests that these advancements will stimulate further developments in face recognition technology, fostering more equitable capabilities between academic research and industry powerhouses.

Looking ahead, this work sets the groundwork for future research that can build on these datasets and protocols to explore broader realms such as fairness in AI, domain adaptation, and the deployment of face recognition systems in diverse application scenarios. The scope for future developments includes optimizing training with large-batch distributed systems, improving data cleaning techniques, and ensuring demographic diversity and fairness in model performance.

PDF Markdown