DCFace: Synthetic Face Generation with Dual Condition Diffusion Model (2304.07060v1)

Published 14 Apr 2023 in cs.CV

Abstract: Generating synthetic datasets for training face recognition models is challenging because dataset generation entails more than creating high fidelity images. It involves generating multiple images of same subjects under different factors (\textit{e.g.}, variations in pose, illumination, expression, aging and occlusion) which follows the real image conditional distribution. Previous works have studied the generation of synthetic datasets using GAN or 3D models. In this work, we approach the problem from the aspect of combining subject appearance (ID) and external factor (style) conditions. These two conditions provide a direct way to control the inter-class and intra-class variations. To this end, we propose a Dual Condition Face Generator (DCFace) based on a diffusion model. Our novel Patch-wise style extractor and Time-step dependent ID loss enables DCFace to consistently produce face images of the same subject under different styles with precise control. Face recognition models trained on synthetic images from the proposed DCFace provide higher verification accuracies compared to previous works by $6.11\%$ on average in $4$ out of $5$ test datasets, LFW, CFP-FP, CPLFW, AgeDB and CALFW. Code is available at https://github.com/mk-minchul/dcface

Citations (78)

View on Semantic Scholar

Summary

The paper introduces a Dual Condition Face Generator using a denoising diffusion model to synthesize facial images with controlled identity and style variations.
The paper implements a patch-wise style extractor and time-step dependent ID loss to maintain label consistency while enhancing image diversity.
The paper reports a 6.11% boost in verification accuracy on benchmark datasets, underlining its potential for privacy-preserving synthetic face data generation.

DCFace: Synthetic Face Generation with Dual Condition Diffusion Model

The paper introduces DCFace, a novel method for generating synthetic face datasets aimed at improving the performance of face recognition (FR) models. By harnessing dual condition diffusion models, DCFace addresses key challenges in synthetic dataset generation, including maintaining label consistency, and enhancing diversity while ensuring the uniqueness of the generated subjects. The proposed approach overcomes the limitations of previous methods by introducing a diffusion-based framework that conditions on both identity (ID) and style, allowing for the creation of face images with controlled inter-class and intra-class variations.

Technical Contribution

The primary innovation of the paper is the Dual Condition Face Generator which utilizes a denoising diffusion probabilistic model (DDPM) to synthesize facial images. The generator takes two conditioning inputs: an ID image that dictates the subject's appearance and a style image that determines other visual factors such as pose, illumination, and expression. The method thus facilitates finer control over the generation process than existing generative adversarial networks (GANs) or 3D model-based approaches.

To implement this dual condition framework, the paper proposes:

Patch-wise Style Extractor: This component extracts style information from the chosen style image through patch-divisions, ensuring spatial information is preserved. It restricts the ID information in style vectors, thus encouraging reliance on the ID condition during synthesis.
Time-step Dependent ID Loss: This innovation applies a time-dependent weighting mechanism in ID loss computation, which balances between maintaining the subject's identity and adapting the style throughout the denoising process. This approach helps maintain ID consistency while allowing style changes.

Evaluations and Results

The performance of DCFace is evaluated against established synthetic datasets like SynFace and DigiFace. The results demonstrate a notable improvement in face verification tasks across several benchmark datasets including LFW, CFP-FP, CPLFW, AgeDB, and CALFW. Notably, DCFace achieves an average increase of 6.11% in verification accuracy over previous state-of-the-art methods with a modest dataset of 0.5 million images, demonstrating a refined balance between style diversity and label consistency.

Furthermore, the proposed metrics—uniqueness, consistency, and diversity—provide a quantitative analysis of the generated datasets that extends beyond mere verification accuracy, showcasing the strengths of DCFace in crafting diverse yet consistent datasets with substantial subject variety.

Implications and Future Work

The advancement of DCFace underscores its potential for reducing the dependency on large-scale web-crawled datasets for training FR models. By offering privacy-preserving synthetic alternatives, this work contributes to addressing ethical concerns associated with the consent-less usage of real facial data. Additionally, the ability to fine-tune the balance between ID and style generates datasets that are adaptable to various training scenarios and domain-specific needs.

Future research can explore enhancing the 3D consistency of synthetic faces, which was suggested as a notable strength of methods employing 3D models. Moreover, the exploration of leveraging fully synthetic data or integrating synthetic with minimal real datasets remains an open research avenue aimed at further bridging the performance gap between real and synthetic training datasets. The implementation of DCFace paves the way for future work to explore these dimensions and further improve the quality and applicability of synthetic face data.

PDF Markdown

Related Papers

GitHub

GitHub - mk-minchul/dcface (116 stars)