Physically-Based Face Rendering for NIR-VIS Face Recognition (2211.06408v1)

Published 11 Nov 2022 in cs.CV

Abstract: Near infrared (NIR) to Visible (VIS) face matching is challenging due to the significant domain gaps as well as a lack of sufficient data for cross-modality model training. To overcome this problem, we propose a novel method for paired NIR-VIS facial image generation. Specifically, we reconstruct 3D face shape and reflectance from a large 2D facial dataset and introduce a novel method of transforming the VIS reflectance to NIR reflectance. We then use a physically-based renderer to generate a vast, high-resolution and photorealistic dataset consisting of various poses and identities in the NIR and VIS spectra. Moreover, to facilitate the identity feature learning, we propose an IDentity-based Maximum Mean Discrepancy (ID-MMD) loss, which not only reduces the modality gap between NIR and VIS images at the domain level but encourages the network to focus on the identity features instead of facial details, such as poses and accessories. Extensive experiments conducted on four challenging NIR-VIS face recognition benchmarks demonstrate that the proposed method can achieve comparable performance with the state-of-the-art (SOTA) methods without requiring any existing NIR-VIS face recognition datasets. With slightly fine-tuning on the target NIR-VIS face recognition datasets, our method can significantly surpass the SOTA performance. Code and pretrained models are released under the insightface (https://github.com/deepinsight/insightface/tree/master/recognition).

Citations (6)

View on Semantic Scholar

Summary

The paper introduces a novel method for generating synthetic NIR-VIS facial images, effectively bridging the domain gap in face recognition.
The method utilizes 3D face reconstruction and a VIS-to-NIR transformation pipeline to create photorealistic image pairs under varying conditions.
The proposed ID-MMD loss aligns identity features across modalities, improving recognition robustness even in low-shot scenarios.

Physically-Based Face Rendering for NIR-VIS Face Recognition

The paper presents a novel approach for enhancing Near Infrared (NIR) to Visible (VIS) face recognition by addressing the inherent challenges posed by significant domain gaps and limited cross-modality training data. This work introduces a method for generating paired NIR-VIS facial images through physically-based rendering, leveraging 3D facial reconstructions and a VIS-to-NIR transformation to create a large, diverse, and photorealistic dataset.

Methodology Overview

The authors' methodology involves several key components:

3D Face Reconstruction and Reflectance Mapping: Using 2D facial datasets, the method reconstructs 3D face shapes and reflectance attributes. This step utilizes state-of-the-art reflectance acquisition techniques to ensure high-quality VIS attributes.
VIS-to-NIR Reflectance Transformation: The transformation involves a wavelength-based empirical model to adapt VIS reflectance characteristics, such as diffuse and specular albedo, to the NIR domain. This transformation preserves identity consistency across modalities.
Physically-Based Rendering: A rendering pipeline generates synthetic NIR-VIS image pairs under various conditions. This process ensures photorealism and allows for systematic control over variables like identity, pose, expression, and lighting.
Identity-Based Maximum Mean Discrepancy (ID-MMD) Loss: To enhance feature learning and minimize modality gaps, the authors propose an ID-MMD loss. This loss focuses on aligning identity feature centroids rather than fine details, effectively reducing inter-modality discrepancies.

Experimental Results

The method exhibits strong performance across several benchmarks, including CASIA NIR-VIS 2.0 and LAMP-HQ, without relying on existing NIR-VIS datasets. The generated data, when used in conjunction with the proposed loss function, achieves performance on par with state-of-the-art methods. Notably, the method shows significant improvements with minimal fine-tuning on target datasets.

Contributions and Implications

The contributions of this research are significant:

NIR-VIS Data Generation: The framework enables the creation of extensive paired datasets, alleviating data scarcity issues that limit model training effectiveness.
Enhanced Feature Learning: Through ID-MMD, the approach aligns high-level identity features across modalities, improving deeper network generalization.
SOTA Performance: The approach outperforms existing models, especially in challenging low-shot scenarios, highlighting its robustness and applicability.

Future Directions

The implications for AI and computer vision are profound. This paper sets a precedent for employing physically-based rendering in solving cross-modal recognition tasks. Future work could explore extending similar methodologies to broader spectral ranges beyond NIR-VIS, integrating generative models for augmenting synthetic data realism, or leveraging this framework in real-time applications on mobile devices.

Conclusion

This research presents clear advancements in NIR-VIS face recognition through innovative synthetic data generation and a carefully designed loss function. It addresses critical challenges within the domain effectively, potentially influencing future work in cross-modal recognition and synthetic data augmentation.

PDF Markdown

Related Papers

GitHub

GitHub - deepinsight/insightface: State-of-the-art 2D and 3D Face Analysis Project (23,338 stars)