On the use of automatically generated synthetic image datasets for benchmarking face recognition (2106.04215v1)

Published 8 Jun 2021 in cs.CV

Abstract: The availability of large-scale face datasets has been key in the progress of face recognition. However, due to licensing issues or copyright infringement, some datasets are not available anymore (e.g. MS-Celeb-1M). Recent advances in Generative Adversarial Networks (GANs), to synthesize realistic face images, provide a pathway to replace real datasets by synthetic datasets, both to train and benchmark face recognition (FR) systems. The work presented in this paper provides a study on benchmarking FR systems using a synthetic dataset. First, we introduce the proposed methodology to generate a synthetic dataset, without the need for human intervention, by exploiting the latent structure of a StyleGAN2 model with multiple controlled factors of variation. Then, we confirm that (i) the generated synthetic identities are not data subjects from the GAN's training dataset, which is verified on a synthetic dataset with 10K+ identities; (ii) benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (3)

Laurent Colbois (6 papers)
Tiago de Freitas Pereira (4 papers)
Sébastien Marcel (39 papers)

Citations (34)

View on Semantic Scholar

On the use of automatically generated synthetic image datasets for benchmarking face recognition (2106.04215v1)

Related Papers