SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild (1712.01261v2)

Published 2 Dec 2017 in cs.CV

Abstract: We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance. SfSNet is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real world images. This allows the network to capture low frequency variations from synthetic and high frequency details from real images through the photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns a complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation.

Citations (310)

View on Semantic Scholar

Summary

The paper SfSNet introduces an end-to-end learning framework using a physical rendering model and mixed-data training to decompose unconstrained facial images into shape, reflectance, and illumination.
Numerical results show SfSNet significantly improves normal estimation accuracy by 47% on the Photoface dataset and lighting estimation accuracy by 12.5% on the MultiPIE dataset compared to prior methods.
The ability to accurately decompose facial images has practical implications for applications like augmented reality, adaptive image editing, and re-lighting, potentially generalizing to inverse rendering for other objects.

SfSNet: Learning Shape, Reflectance, and Illuminance of Faces in the Wild

The paper "SfSNet: Learning Shape, Reflectance, and Illuminance of Faces in the Wild" introduces an innovative approach for inverse rendering, tackling the decomposition of unconstrained human face images into their intrinsic components: shape, reflectance, and illumination. The proposed solution, SfSNet, is an end-to-end learning framework that leverages a physical lambertian rendering model to achieve this decomposition task.

Overview of Methodology

SfSNet addresses the limitations of previous inverse rendering methods, which often relied heavily on synthetic datasets that do not generalize well to real-world scenarios due to differences in illumination and expression. This challenge is confronted by employing a mixed-data training paradigm termed 'SfS-supervision.' The network is trained on both labeled synthetic data and unlabeled real-world images. The architecture utilizes a novel decomposition method with residual blocks facilitating a comprehensive separation of albedo and normal features.

Key architectural components include separate residual block sequences for learning normal and albedo, with further integration with the original image to predict lighting. The approach is inspired by classical Shape from Shading (SfS) techniques; however, it contemporizes these concepts within a deep learning framework to handle the frequency variation challenges among real and synthetic data.

Numerical Results and Claims

The efficacy of SfSNet is demonstrated through extensive experimentation across various datasets. Specifically, it achieves a substantial improvement in normal estimation accuracy, boasting a 47% enhancement compared to prior methods, as evaluated on the Photoface dataset. This improvement is measured by mean angular error metrics which highlight the robustness of SfSNet in handling complex variations in real-world lighting conditions. Furthermore, in lighting estimation tasks on the MultiPIE dataset, SfSNet achieves a commendable 12.5% improvement in classification accuracy over state-of-the-art methodologies.

Implications and Future Directions

The contributions of SfSNet are multifaceted, offering both theoretical and practical advancements in the field of computer vision. The ability to accurately decompose face images into intrinsic components holds significant promise for applications in augmented reality, where understanding facial illumination and reflectance accurately is paramount. Furthermore, the robust framework posited by SfSNet opens avenues for adaptive image editing capabilities, such as re-lighting and light transfer, thereby extending its utility across creative and technical domains.

Looking forward, this research invites speculation on potential developments in AI and computer vision. The integration of photometric reconstruction loss with reconstruction capabilities points towards more holistic training paradigms, possibly extending beyond face reveal more generalized inverse rendering solutions for diversified object categories. Essentially, the conceptual and methodological aspects of SfSNet inspire future explorations into combining labeled synthetic and unlabeled real data efficiently for improved generalization across machine learning models in visual perception tasks.

In summary, the SfSNet framework represents a significant progression in inverse rendering, combining innovative architectural solutions with strategic data handling to deliver superior performance across a variety of applications in AI-driven image processing.

PDF Markdown