Papers
Topics
Authors
Recent
2000 character limit reached

FFHQ-UV-Intrinsics: High-Res Facial Reflectance Data

Updated 26 December 2025
  • FFHQ-UV-Intrinsics is a facial dataset offering UV-mapped diffuse albedo, specular albedo, ambient occlusion, translucency, normals, and displacement maps at 4K resolution for over 10,000 identities.
  • The automated MoSAR pipeline uses semi-supervised training, differentiable rendering, and U-Net inpainting to generate complete, neutral-light UV maps that support both forward and inverse rendering research.
  • Compared to traditional light-stage and GAN-based methods, the dataset delivers comprehensive intrinsic decomposition with open licensing (CC-BY-4.0), enabling advanced applications in relighting and statistical face modeling.

FFHQ-UV-Intrinsics is a large-scale facial dataset providing per-subject UV-mapped intrinsic face attributes, including diffuse albedo, specular albedo, ambient occlusion (AO), translucency (thickness), normal maps, and displacement, over 10,000 unique identities at 4K resolution. Building on the FFHQ-UV corpus and motivated by the need for scalable, relightable, and disentangled face reflectance data, the dataset is generated through a fully automated, deep-learning workflow (MoSAR) that combines semi-supervised training on synthetic and real images with light-stage ground truth supervision. In contrast to traditional light-stage captures (hundreds of subjects, limited maps) or proprietary GAN-based pipelines (only partial or non-disentangled maps), FFHQ-UV-Intrinsics offers a unified, high-resolution, and public resource for forward and inverse rendering, statistical face modeling, and computer vision research (Dib et al., 2023).

1. Purpose, Scope, and Comparison to Prior Work

FFHQ-UV-Intrinsics is designed to democratize access to scale-relightable facial intrinsic data. The resource provides, for each subject:

  • Diffuse albedo (𝔇), specular albedo (𝔖), ambient occlusion (𝔄), translucency thickness (𝕋), normal map (𝔑), and displacement (δ), all as explicit UV-space images aligned via a fixed, canonical face template.
  • 10,000 unique identities, corresponding to the highest-confidence subset of the public FFHQ-UV corpus (∼54,000 available UV textures), offering orders of magnitude greater size and attribute diversity than prior public datasets.

Relative to prior established efforts (e.g., USC Light Stage or AvatarMe), which enumerate O(100–1,000) subjects and typically lack full SVBRDF decomposition or public availability, FFHQ-UV-Intrinsics is the first to release all major relightable facial constituents at 4K UV resolution and at open scale (Dib et al., 2023).

2. Data Acquisition and Processing Pipeline

The dataset is assembled through a multi-stage, fully automated framework:

  1. Input Preparation: Starting from FFHQ-UV facial crops (512² StyleGAN-generated or real photographic textures).
  2. 3D Shape Estimation: Semi-supervised GNN-based morphable model fitting and differentiable rendering produce per-subject meshes along with precise camera and lighting parameters.
  3. UV Texture Construction: Original images are projected onto the canonical UV atlas. Incomplete UV textures are completed with a U-Net–based inpainting model.
  4. Lighting Normalization: Meshes rendered under randomly sampled HDR environments produce training data to "undo" illumination effects, yielding a "neutral-light" UV map via an additional U-Net.
  5. Intrinsic Map Prediction: Five distinct generators (for 𝔇, 𝔖, 𝔄, 𝕋, δ) are trained using supervised light-stage data and self-supervision through a differentiable shading energy (see Section 3).
  6. Super-Resolution: All 512² maps are upsampled to 4096² via two-stage ESRGAN, ensuring fine detail for PBR workflows.

Once trained, the pipeline produces the entire dataset in a single automated pass, enabling scalability and uniformity (Dib et al., 2023).

3. Intrinsic Map Definitions and Differentiable Shading Model

All maps are defined over the canonical (512×512) left-right-symmetric UV atlas of FFHQ-UV. For each texel (u, v):

  • 𝔇(u, v) ∈ [0,1]3: diffuse albedo (RGB)
  • 𝔖(u, v): specular albedo (scalar or RGB)
  • 𝔄(u, v) ∈ [0,1]: ambient occlusion
  • 𝕋(u, v) ∈ [0,1]: translucency/thickness
  • 𝔑(u, v): reconstructed normals, via displacement δ(u, v)
  • δ(u, v): displacement map (auxiliary, also released)

The generative shading equation uses a fully differentiable, physically-based spherical harmonic (SH) lighting model:

m^=Bd+Bsss+Bs\hat{m} = \mathcal{B}_d + \mathcal{B}_{sss} + \mathcal{B}_s

where:

  • Bd=dal=02m=llAlγlmYlm(n)\mathcal{B}_d = d \cdot a \cdot \sum_{l=0}^2 \sum_{m=-l}^l A_l \gamma_l^m Y_l^m(n) (Lambertian diffuse, including AO)
  • Bsss=dl=12m=llSlγlmYlm(n)\mathcal{B}_{sss} = d \cdot \sum_{l=1}^2 \sum_{m=-l}^l S_l \gamma_l^m Y_l^m(n), Sl=exp(l2/T4)S_l = \exp(-l^2 / \mathbb{T}^4) (subsurface, with thickness)
  • Bs=f(θ)l=08m=llRlγlmYlm(r)\mathcal{B}_s = f(\theta) \cdot \sum_{l=0}^8 \sum_{m=-l}^l R_l \gamma_l^m Y_l^m(r), f(θ)=s+(1s)(1cosθ)5f(\theta)=s+(1−s)(1−\cos\theta)^5, ss from 𝔖 (specular, with Fresnel gain)

All terms are constructed for full backpropagation, supporting end-to-end learning of shading and reflectance component separation (Dib et al., 2023).

4. Data Structure, Formats, and UV Parameterization

Each subject’s data is organized as follows:

File Name Contents Format
diffuse.exr Diffuse albedo (RGBA, 4K) OpenEXR
specular.exr Specular albedo (4K) OpenEXR
ao.exr Ambient occlusion (4K) OpenEXR
translucency.exr Translucency (thickness) OpenEXR
normal.exr UV normal map (4K) OpenEXR
displacement.exr Displacement (auxiliary) OpenEXR
uv_basecolor.png Light-normalized albedo PNG (optional)
metadata.json Camera, SH lighting, etc. JSON

Naming is consistent as <subjectID>_<mapname>.<ext>. Every map is stored in a 4096×4096 UV grid, making 1:1 pixel correspondence between all subjects and maps. The underlying mesh topology and UV mapping are identical to FFHQ-UV, with fixed canonical face spacing and symmetrical left/right alignment. A plausible implication is that this parameterization enables pixel-wise statistical modeling—such as PCA or deep generative models—across the full corpus.

No official data splits (train/val/test) are imposed; internal MoSAR experiments used a 90/10 split, but end-users define their own. All maps are distributed under CC-BY-4.0, supporting both academic and commercial use with attribution (Dib et al., 2023).

5. Benchmark Performance and Example Applications

MoSAR reports the following per-map SSIM values against light-stage ground truth for held-out subjects:

  • Diffuse: 0.83
  • Specular: 0.65
  • AO: 0.85
  • Translucency: 0.82

Practical applications demonstrated include:

  • Robust full-face relighting (using all four intrinsic maps with standard PBR shaders in engines such as Unity and Unreal Engine).
  • End-to-end inverse rendering pipelines and supervised training of models for SVBRDF estimation.
  • Statistical face modeling (PCA or latent-variable models) due to matched UV geometry and attribute correspondence.
  • Code examples for loading EXR maps and configuring standard game/graphics pipelines are provided.
  • A plausible implication is that the coverage and fidelity make FFHQ-UV-Intrinsics suitable for rigorous psychophysical, statistical, or generative modeling work.

6. Historical Context, Downstream Relevance, and Licensing

FFHQ-UV-Intrinsics integrates advances in photometric and multi-view face capture (Seck et al., 2016) with modern large-scale generative pipelines (Dib et al., 2023). Earlier methods (photometric light stage with geometric fusion) enabled controlled, high-accuracy decompositions but at high cost and small scale (Seck et al., 2016). GAN-based approaches forfeited disentanglement and intrinsic parameterization. By automating the entire process, leveraging both supervised and self-supervised objectives, and aligning all data to a unified UV space, FFHQ-UV-Intrinsics closes this gap, supporting downstream applications in relighting, animation, inverse rendering, vision science, and more.

The dataset is distributed under CC-BY-4.0, with no NDA and permitted for commercial use, with appropriate attribution to MoSAR: “Monocular Semi-Supervised Model for Avatar Reconstruction” (Dib et al., CVPR 2024) (Dib et al., 2023). The release includes all raw maps, derived assets, and metadata.

Compared to previous resources:

Dataset #Subjects Released Maps Public/Proprietary
USC Light Stage ~100 Diffuse, specular Public, limited
AvatarMe/Relightify ~1000 Diffuse, specular, normals Proprietary
FFHQ-UV-Intrinsics 10,000 Diffuse, specular, AO, translucency, normals, displacement Public, open

FFHQ-UV-Intrinsics is notable for scale, completeness, and open licensing. Limitations mentioned include:

  • Only the highest-confidence subset (~10,000 of 54,000 FFHQ-UV identities) is released.
  • Map accuracy (SSIM) is highest for diffuse/AO, lower for specular. This suggests ongoing difficulty in disentangling highlights from limited-view single-image cues.
  • Data splits, demographic attributes, and some metadata fields are not formally specified in the main paper.

A plausible implication is the resource will facilitate improvements in single-image reflectance estimation, domain adaptation, and generalizable facial reconstruction, and support benchmarks for disentanglement and relighting tasks across vision and graphics.


References:

MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading (Dib et al., 2023) Ear-to-ear Capture of Facial Intrinsics (Seck et al., 2016)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to FFHQ-UV-Intrinsics Dataset.