Papers
Topics
Authors
Recent
Search
2000 character limit reached

Single-Image Portrait Relighting

Updated 12 March 2026
  • Single image portrait relighting is the process of synthesizing realistic lighting on a human portrait from an unconstrained input by disentangling intrinsic facial properties and external illumination.
  • It leverages deep neural networks and architectures such as U-Net, intrinsic decomposition, and 3D-aware models to achieve photorealistic relit outputs with precise control over shadows and highlights.
  • Applications span photographic editing, AR/VR content, and visual effects, with performance evaluated using perceptual metrics like SSIM, LPIPS, and quantitative error measurements.

Single image portrait relighting is the task of synthesizing the appearance of a human portrait under novel scene illumination, using only a single input image. This process enables realistic manipulation of perceived lighting conditions and is foundational for applications including photographic editing, AR/VR content, visual effects, and computational photography. Unlike classic approaches requiring controlled light stages, detailed 3D reconstructions, or multiple captures under varying conditions, single image relighting seeks to extract, transform, and recombine lighting information directly from unconstrained consumer photos. The technical challenge centers on disentangling intrinsic facial properties (albedo, geometry) from illumination, then re-synthesizing the image under a user-specified lighting environment, typically encoded as a high dynamic range (HDR) environment map.

1. Problem Definition and Physical Formulation

Formally, the input is an RGB portrait IRH×W×3I \in \mathbb{R}^{H \times W \times 3} acquired in unknown or arbitrary conditions, optionally with a foreground mask MM. The user supplies a target lighting environment LtargetRM×N×3L_\mathrm{target} \in \mathbb{R}^{M\times N\times 3}, typically parameterized as a latitude–longitude HDR panorama. The relighting task is to produce an image RR of the same subject as if illuminated by LtargetL_\mathrm{target}, while optionally estimating the original environment LsourceL_\mathrm{source}. The canonical image formation model is

R(p)=ωΩL(ω)ρ(p,ω)max(n(p)ω,0)dωR(p) = \int_{\omega \in \Omega} L(\omega)\,\rho(p, \omega)\,\max(n(p)\cdot \omega, 0)\,d\omega

where ρ\rho is the spatially-varying BRDF and n(p)n(p) is the surface normal at pixel pp. Rather than explicit inversion, recent methods learn a feedforward mapping (L^source,R)=f(I,Ltarget)(\hat{L}_\mathrm{source},R) = f(I, L_\mathrm{target}) via deep neural networks trained on paired or synthetic relighting data (Sun et al., 2019).

Practically, relighting methods must handle ambiguities in geometry, reflectance, and lighting, provide spatial control over shadows/specularities, and generalize to "in-the-wild" input with diverse appearances, occlusions, and background content. Success is measured by perceptual realism, geometric consistency, and quantitative similarity to ground-truth relit images.

2. Datasets and Synthetic Data Generation

Access to comprehensive, physically consistent training data is a principal bottleneck for portrait relighting research. High-fidelity datasets typically use light stage apparatus:

  • Arrays of hundreds of LED sources (OLAT: One-Light-At-a-Time) sample directional lighting; multi-view camera rigs provide multi-angle supervision; per-frame tracking corrects for subtle subject motions (Sun et al., 2019).
  • Post-processing workflows composite HDR environment lighting by linearly combining OLAT images weighted by projected solid angles or via spherical harmonics.
  • Known public datasets include Laval Indoor/Outdoor (HDR), PolyHaven (Mei et al., 2024), and recently large-scale, multi-expression sets such as POLAR (220 subjects, 156 lights, 28M images) (Chen et al., 15 Dec 2025) and FaceOLAT (139 subjects, 331 lights, 4K resolution) (Rao et al., 17 Oct 2025).

To bypass hardware constraints, several works synthesize paired data:

  • Virtual light stage rendering superimposes detailed 3D faces, hair, clothing, and accessories, then renders under randomized panoramic maps using physically based rendering engines (e.g., Arnold, Blender Cycles) (Yeh et al., 2022, Chaturvedi et al., 16 Jan 2025).
  • Domain gap bridging employs synthetic-to-real adaptation with real portraits pooled for residual learning, GAN-based refinement, or multi-task objectives (Yeh et al., 2022, Chaturvedi et al., 16 Jan 2025).

Synthetic data allows explicit control over all ground-truth factors (albedo, normals, HDR lighting, shadow masks), critical for disentanglement and evaluative benchmarking.

3. Network Architectures and Representational Strategies

Portrait relighting architectures reflect two foundational paradigms: direct image-to-image translation and physically motivated inverse rendering.

Encoder–Decoder and U-Net Variants

  • Many early and baseline methods adopt U-Net-style encoder–decoders, employing skip connections and foreground masks to preserve facial details and suppress backgrounds (Sun et al., 2019).
  • Auxiliary prediction heads may regress lighting parameters (e.g., environment maps or SH coefficients) with spatial confidences, facilitating both relit outputs and lighting estimation (Sun et al., 2019).

Intrinsic Decomposition and Inverse Rendering

  • Physically inspired pipelines decompose input into albedo, normal, and lighting using supervised or self-supervised learning; relighting is explicit via parametric reflectance models such as Lambertian plus spherical harmonics (Zehni et al., 2021).
  • Self-supervised approaches enforce invariances among multi-illumination pairs or under geometric flipping/rotation augmentations, constraining the lighting code to SH parameter space (degree ≤ 2) (Liu et al., 2020).
  • Feature disentanglement is enhanced by cross-relighting losses and domain adaptation, enabling robust transfer to real images and diverse lighting (Zehni et al., 2021, Yeh et al., 2022).

3D-Aware and Volumetric Models

  • EG3D-style 3D GANs and tri-plane representations embed single portrait images into geometry-aware latent spaces, supporting full volumetric rendering, viewpoint change, and relighting (Mei et al., 2024, Rao et al., 2024, Rao et al., 17 Oct 2025).
  • Dedicated relighting modules process encoded tri-plane features, target HDR environment maps, and (optionally) explicit head-pose inputs (Mei et al., 2024).
  • OLAT-basis–driven models predict per-light responses in a flow-based or triplane-augmented latent space, supporting physically accurate environmental mixing and interpretable lighting control (Chen et al., 15 Dec 2025, Rao et al., 17 Oct 2025).

Diffusion and Generative Frameworks

4. Training Procedures, Objectives, and Losses

Training methodologies blend supervised, self-supervised, and adversarial strategies:

  • Core photometric losses (L1, L2) on predicted vs. ground-truth relit pixels, often weighted within segmented foregrounds (Sun et al., 2019).
  • Lighting consistency: SH or full env-map prediction losses; flow matching in latent OLAT-generation models (Chen et al., 15 Dec 2025).
  • Self-supervision: image reconstruction under original or jittered source lighting (to disentangle albedo and illumination), cross-input consistency on multi-lit pairs (Sun et al., 2019, Zehni et al., 2021).
  • Perceptual (VGG/LPIPS) losses for fine-scale qualitative improvement, especially in generative setups (Mei et al., 2024).
  • Adversarial losses (PatchGAN) may be used but are not strictly necessary for sharpness when physics-based constraints dominate (Chen et al., 15 Dec 2025).
  • Explicit geometric and identity preservation: losses in face-embedding space (ArcFace/MagFace), as well as temporal regularization for video (Rao et al., 17 Oct 2025, Yeh et al., 2022).

Typical pipelines combine diverse data augmentations: random cropping, environment map rotations, synthetic shadow/specularity augmentation, and photometric normalization.

5. Relighting Inference, Applications, and User Control

Inference protocols convert novel single images into relit portraits as follows:

  • Input images are preprocessed via cropping, matting, or face segmentation; optional normalization or color correction adjusts dynamic range (Sun et al., 2019).
  • The network infers scene attributes (albedo, normals, lighting parameters), performs environment map encoding, and generates the relit output conditioned on the target (Zehni et al., 2021, Mei et al., 2024).
  • OLAT-driven or tri-plane volumetric models enable explicit environment mixing, 3D consistent relighting, and novel viewpoint rendering (Chen et al., 15 Dec 2025, Rao et al., 17 Oct 2025, Rao et al., 2024).
  • Diffusion models provide additional controls via classifier-free guidance, trading off identity preservation and relighting strength (Chaturvedi et al., 16 Jan 2025). Text-driven relighting is supported in recent generative architectures (Liu et al., 17 Jun 2025, Cha et al., 2024), including freehand-scribble or parameter-sweep interfaces for intuitive user editing (Mei et al., 2023, Futschik et al., 2023).
  • Applications extend to complete lighting swaps across portraits, light transfer between images, scene harmonization for compositing, video relighting, and semantic/structural editing.

6. Quantitative Metrics, Limitations, and Comparative Results

Standard evaluation metrics include RMSE, scale-invariant RMSE, SSIM, DSSIM, PSNR, LPIPS, and face-ID similarity (cosine in embedding space) (Sun et al., 2019, Mei et al., 2024, Cha et al., 2024, Chen et al., 15 Dec 2025). State-of-the-art models on held-out test sets report (example values):

Noted limitations include:

7. Emerging Directions and Technical Extensions

Recent innovations in single-image portrait relighting include:

The field continues to advance toward physically accurate, identity-preserving, highly controllable portrait relighting from a single unconstrained input, leveraging synergistic progress in generative modeling, large-scale data collection, and computationally efficient rendering.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Single Image Portrait Relighting.