The MegaFace Benchmark: 1 Million Faces for Recognition at Scale (1512.00596v1)

Published 2 Dec 2015 in cs.CV

Abstract: Recent face recognition experiments on a major benchmark LFW show stunning performance--a number of algorithms achieve near to perfect score, surpassing human recognition rates. In this paper, we advocate evaluations at the million scale (LFW includes only 13K photos of 5K people). To this end, we have assembled the MegaFace dataset and created the first MegaFace challenge. Our dataset includes One Million photos that capture more than 690K different individuals. The challenge evaluates performance of algorithms with increasing numbers of distractors (going from 10 to 1M) in the gallery set. We present both identification and verification performance, evaluate performance with respect to pose and a person's age, and compare as a function of training data size (number of photos and people). We report results of state of the art and baseline algorithms. Our key observations are that testing at the million scale reveals big performance differences (of algorithms that perform similarly well on smaller scale) and that age invariant recognition as well as pose are still challenging for most. The MegaFace dataset, baseline code, and evaluation scripts, are all publicly released for further experimentations at: megaface.cs.washington.edu.

Citations (843)

View on Semantic Scholar

Summary

The paper demonstrates that state-of-the-art recognition algorithms see a significant accuracy drop when evaluated with one million distractor images.
The paper shows that larger training datasets, such as FaceNet's 500M images, yield better performance at scale compared to algorithms trained on smaller datasets.
The evaluation highlights crucial challenges in age-invariant and pose-variant face recognition, guiding future research directions.

Overview of the MegaFace Benchmark Paper

The paper "The MegaFace Benchmark: 1 Million Faces for Recognition at Scale," authored by Ira Kemelmacher-Shlizerman, Steve Seitz, Daniel Miller, and Evan Brossard from the University of Washington, presents a comprehensive evaluation framework for face recognition algorithms operating on large-scale datasets. The MegaFace benchmark introduces an unprecedented scale in terms of both the number of photos and the diversity of individuals, making a significant contribution to the field of face recognition.

Introduction and Motivation

The initial motivation behind the MegaFace benchmark arises from the observation that leading face recognition algorithms had already surpassed human performance on existing benchmarks like the Labeled Faces in the Wild (LFW) dataset, which comprises only 13K photos of 5K individuals. However, these benchmarks do not adequately represent the challenges encountered in real-world applications, such as recognizing individuals in a database of billions of people. The MegaFace dataset was conceived to address this gap by scaling up the number of “distractors” to a million, thus testing the fundamental limits of current algorithms.

Dataset and Evaluation Protocols

The MegaFace dataset includes one million images spanning more than 690K unique individuals, collated from the Yahoo Flickr Creative Commons 100M dataset. The methodology for constructing this dataset focused on diversity, unconstrained imaging conditions, and ensuring a large number of unique identities. The dataset assembly involved advanced face detection techniques and processing to ensure a high probability of unique identities across the dataset.

The benchmark utilizes two probe sets for evaluation: the FaceScrub dataset, which includes 100K images of 530 celebrities, and the FG-NET dataset, which contains 975 images of 82 individuals displaying significant age variation. The evaluation metrics focus on two main tasks: identification (evaluating the rank-order of similarity scores in the gallery) and verification (a binary decision of whether two faces belong to the same individual). The benchmark captures performance as the number of distractors increases from 10 to 1 million.

Key Findings and Results

The paper presents several critical findings from the evaluation results:

Scalability of Algorithms:
- Algorithms achieving over 95% accuracy on LFW see a significant drop to between 35-75% in identification rates with one million distractors.
- Baseline algorithms like Joint Bayes and LBP, which perform reasonably well on smaller datasets, show substantial performance degradation with less than 10% identification rates at the million scale.
Influence of Training Data Size:
- Algorithms trained on larger datasets tend to perform better at scale. For example, Google's FaceNet, trained on over 500M photos, shows superior performance compared to other algorithms.
- The performance comparison between FaceNet and FaceN suggests efficiency gains with smaller datasets (FaceN trained on 18M photos compares favorably to FaceNet).
Age-Invariant Recognition:
- Recognition performance is significantly affected by age variation, particularly with younger individuals and larger age gaps.
- Top-performing algorithms like FaceNet, while maintaining commendable performance, still exhibit drops in accuracy with aging datasets (FG-NET).
Pose Variation:
- Recognition rates decrease with larger pose variations between the probe and gallery images.
- The effect of pose is more pronounced at scale, with lower performance observed when there is a significant difference in yaw angles.

Implications and Future Directions

The implications of this work are far-reaching, both practically and theoretically. The MegaFace benchmark sets a new standard for the evaluation of face recognition algorithms, pushing the field towards developing models that can operate effectively at a planetary scale. The benchmark highlights the limitations of current algorithms, spurring further research into handling large-scale distractors, improving age-invariant recognition, and managing pose variations.

Future developments could include expanding the dataset further, leveraging higher-resolution imagery, and providing large-scale training data to the broader research community. Additionally, exploring new solutions to address the degradation in performance due to age and pose variations will be crucial. The dataset's public availability facilitates ongoing improvements and reproducible research, ensuring the benchmark remains relevant as new methodologies and technologies emerge.

Conclusion

The MegaFace benchmark represents a significant advancement in evaluating face recognition technologies. By incorporating a large-scale, diverse dataset and rigorous evaluation protocols, it provides valuable insights into the current state of the art while setting the stage for future innovations. The findings underscore the complexities of face recognition at scale, emphasizing the need for continual improvement in algorithms to achieve reliable performance in real-world applications.

PDF Markdown