Papers
Topics
Authors
Recent
Search
2000 character limit reached

Portrait Video Relighting Pipeline

Updated 20 April 2026
  • Portrait video relighting pipelines are systems that modify illumination in videos while preserving identity, geometry, and photorealism under novel lighting.
  • They integrate techniques like 3D Gaussian Splatting, mesh-based tracking, and neural decoders to separate static and dynamic appearance features for realistic light editing.
  • Evaluations using PSNR, SSIM, and LPIPS metrics demonstrate real-time performance (~20 FPS) with robust identity and expression preservation for VR/AR and digital entertainment.

A portrait video relighting pipeline comprises the set of algorithms, data representations, and rendering techniques that enable modification of illumination in portrait videos while maintaining identity, geometry, and photo-realism under novel lighting conditions or environments. Modern pipelines combine statistical head/body priors, explicit or hybrid volumetric appearance models, deformation fields or mesh-based tracking, and high-efficiency rasterization schemes—most recently, 3D Gaussian Splatting (3DGS)—to achieve these goals in real-time. The paradigm shift from neural radiance fields to Gaussian splatting for portrait avatars has unlocked unprecedented rendering rates with strong expressivity and editability, critically impacting telepresence, AR, VR, and digital entertainment systems.

1. Representation of Portrait Geometry and Appearance

State-of-the-art portrait relighting approaches rely on explicit volumetric representations for both static structure and expression-dependent dynamics. The dominant paradigm is 3D Gaussian Splatting, where each avatar is encoded by a set of approximately 10410^410510^5 parametric Gaussian primitives, each defined by position μR3\mu \in \mathbb{R}^3, covariance ΣR3×3\Sigma \in \mathbb{R}^{3 \times 3} (often factored as rotation + scale), color cR3c \in \mathbb{R}^3 (in RGB or learned coefficients, e.g., spherical harmonics), and opacity α\alpha (Guo et al., 19 Apr 2025, Zhang et al., 17 Apr 2025).

Portrait geometry is parametrized via either FLAME or SMPL-X 3DMMs, whose surfaces are further deformed via data-driven displacement fields or residual per-vertex offsets learned from large-scale, multi-view or monocular datasets:

  • Fine facial geometry (wrinkles, lips, teeth) is obtained by learning a displacement field over UV-space, typically via a VAE, with the final posed geometry

M^pos(u,v)=Mpos(u,v)+Mdisp(u,v)\hat M_\mathrm{pos}(u,v) = M_\mathrm{pos}(u,v) + M_\mathrm{disp}(u,v)

as in the SEGA pipeline (Guo et al., 19 Apr 2025).

  • In both whole-body and head-specific avatars, canonical meshes provide topology and serve as the frame of reference for all further deformation and parameter prediction (e.g., skinning for animation).

The hybridization of static (expression-invariant) and dynamic (expression-varying) components, typically via dual-branch Gaussian decoding in UV or mesh space, is critical for achieving disentangled control over expression and lighting (Guo et al., 19 Apr 2025).

2. Relighting-Specific Conditioning and Decomposition

The distinguishing feature of a relighting pipeline is the explicit modeling and manipulation of appearance under novel illumination. This is approached at multiple levels:

  • Spherical Harmonics or Neural Appearance Models: Per-Gaussian color is either directly learned in a low-dimensional basis (e.g., spherical harmonics), or governed by neural networks that decode both color and view/illumination dependence (Zhang et al., 17 Apr 2025, Guo et al., 19 Apr 2025).
  • Lighting Conditioning: While most published avatar pipelines focus on identity, pose, and expression decomposition, several extend the parameterization to include incident illumination, enabling relighting by swapping lighting codes or by embedding Gaussians in canonical UV/mesh domains and predicting per-Gaussian color conditioned on lighting.
  • Static vs. Dynamic Decoders: Static (precomputed) branches represent regions insensitive to expression and can be lit independently or precomputed under multiple lighting scenarios. Dynamic regions (mouth, face interior) have their parameters predicted conditionally, enabling fine-grained relighting responsive to both local and global appearance.

In SEGA, color prediction for each UV location is achieved by small U-Net–like decoders that receive identity and expression codes and could be further extended to take lighting descriptors as input. This design enables style transfer or lighting edits at inference time (Guo et al., 19 Apr 2025).

3. Volumetric Splatting and Differentiable Rendering

Splatting-based rendering projects each 3D Gaussian to the image plane, where it becomes an elliptical footprint whose 2D covariance and color are computed via differentiable rasterization. The rendering equation per frame becomes:

Cimg=i=1Nciaij<i(1aj)C_\mathrm{img} = \sum_{i=1}^N c_i a_i \prod_{j<i} (1 - a_j)

where aia_i depends on αi\alpha_i and the Jacobian-projected covariance to the screen (Guo et al., 19 Apr 2025, Zhang et al., 17 Apr 2025). All compositing and filtering operations are fully differentiable and optimized for high-throughput GPU execution (e.g., via the open-source gsplat kernel).

Projection for novel lighting is supported by re-evaluating the color function for each Gaussian conditioned on the desired illumination, possibly in combination with per-Gaussian spherical harmonics shading. Real-time performance is routinely achieved, with render times per frame well below 16ms for 10510^50–10510^51 splats (Zhang et al., 17 Apr 2025, Guo et al., 19 Apr 2025).

4. Training Losses, Identity, and Relighting Fidelity

Pipelines for portrait video relighting are trained with a combination of geometric, photometric, perceptual, and identity-preserving losses:

  • Geometry: VAE-based depth and normal consistency for mesh and Gaussian geometry recovery, Laplacian smoothness, and normal regularization for stability (Guo et al., 19 Apr 2025).
  • Photometry: 10510^52, SSIM, and perceptual (VGG, LPIPS) losses between rendered and ground-truth RGB images (Zhang et al., 17 Apr 2025, Guo et al., 19 Apr 2025).
  • Identity: ArcFace or similar deep-embedding losses to guarantee that relit outputs preserve subject identity.
  • Optional: explicit lighting consistency and multi-illumination matching, leveraging samples captured under distinct lighting; future work is likely to exploit relighting-specific datasets.

Fine-tuning steps allow person-specific refinement, crucial for high-fidelity, expression-rich synthesis. Real-time drivable avatars, once trained, support relighting at interactive rates without any per-frame optimization (Guo et al., 19 Apr 2025).

5. Evaluation Metrics and Performance

Pipelines are evaluated on metrics including PSNR, SSIM, LPIPS (for perceptual similarity), and identity preservation (e.g., cosine similarity in ArcFace space) (Guo et al., 19 Apr 2025). SEGA achieves, for frontal views, PSNR ≈ 24.99, SSIM ≈ 0.8246, LPIPS ≈ 0.2305—significantly outperforming earlier one-shot or NeRF-based head avatars (Guo et al., 19 Apr 2025).

Key system-level performance results:

  • Real-time GS generation, fusion, sampling, and rendering sum to ≈50 ms/frame (10510^53 FPS) on GPU for avatar models of 10510^54–10510^55 Gaussians (Zhang et al., 17 Apr 2025, Guo et al., 19 Apr 2025).
  • Quality under novel lighting and expression regimes is empirically improved by hierarchical UV-organization and dual-branch separation, which encourages identity and expression disentanglement and robust generalization beyond training conditions (Guo et al., 19 Apr 2025).

6. Limitations and Future Directions

Current pipelines are limited by the coverage of facial and hair geometries (regions unseen in training data may exhibit artifacts), and require well-calibrated UV/mesh tracking for reliable relighting. Unsupervised or GAN-driven priors are proposed to hallucinate plausible geometry in occluded or shadowed regions; modular clothing/appearance factorization and live MoCap-driven relighting are cited as future research avenues (Zhang et al., 17 Apr 2025, Guo et al., 19 Apr 2025).

Prospective improvements include:

  • Integration of physics or GAN-driven relighting modules for environment-aware shading.
  • Modular fine-tuning of Gaussian sets for garment or hair editing and appearance retargeting.
  • Enhanced learning frameworks for dense-to-sparse coverage and continuous level-of-detail adaptation.

7. Broader Context and Significance

Portrait video relighting pipelines based on hierarchical Gaussian splatting represent a convergence of 3DMM parametric modeling, explicit volumetric rendering, and differentiable neural inference. These pipelines achieve near-instantaneous inference, high photorealism, robust expression and identity preservation, and enable practical deployment in VR/AR telepresence, digital entertainment, and communication systems. The modular separation into mesh-based geometry, UV-organized Gaussians, and lighting-conditional decoding provides a blueprint for further research at the intersection of rendering, vision, and graphics (Guo et al., 19 Apr 2025, Zhang et al., 17 Apr 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Portrait Video Relighting Pipeline.