Free-Viewpoint Relighting

Updated 17 November 2025

Free-viewpoint relighting is the synthesis of photorealistic images under novel viewpoints and lighting by disentangling geometry, reflectance, and illumination.
Modern methods leverage neural implicit representations, spherical harmonics, and hybrid explicit-implicit models to achieve realistic rendering across diverse scenes.
Advanced training protocols and loss functions ensure robust material generalization, view consistency, and efficient real-time performance.

Free-viewpoint relighting is the task of synthesizing photorealistic images of a scene, object, or dynamic subject under novel combinations of viewpoint and illumination—both of which need not be present in the original capture set. It requires disentanglement of geometry, reflectance, and illumination, supporting editing and rendering from arbitrary camera positions and lighting configurations. Modern advances in free-viewpoint relighting leverage neural implicit representations (notably, Neural Radiance Fields and related models), explicit geometry proxies, physically-based rendering, and specialized network architectures to achieve this capability across a range of domains, from controlled tabletop scenes to challenging outdoor environments, human faces, and dynamic subjects.

1. Theoretical Foundations and Scene Factorization

Free-viewpoint relighting builds directly on the generalized rendering equation, which expresses outgoing radiance at point $x$ in view direction $\omega_o$ as an integral over incident directions $\omega_i$ :

$L_o(x, \omega_o) = \int_{S^2} f_r(x, \omega_i, \omega_o)\, v(x, \omega_i)\, L_i(\omega_i)\, (n \cdot \omega_i)\, d\omega_i$

Here, $f_r$ is the BRDF, $v(x, \omega_i)$ is a visibility term, $L_i$ denotes the incident lighting, and $n$ is the surface normal. Free-viewpoint relighting requires both explicit disentanglement of the factors $f_r$ , $v$ , and $L_i$ , and efficient reparameterization to allow efficient control and synthesis.

Approaches differ in how they realize this decomposition:

NeRF-OSR (Rudnev et al., 2021) factors radiance into spatially-varying albedo $A(x)$ and a shading term dependent on surface normals and lighting, with explicit lighting via 2nd-order spherical harmonics (SH).
Relightable Neural Renderer (RNR) (Chen et al., 2019) learns a light transport function $T(x, \omega_i, \omega_o)$ —integrating visibility and BRDF—and parameterizes illumination as SH, separating diffuse and specular contributions.
Object-Centric Neural Scattering Functions (OSFs) (Yu et al., 2023) directly learn a cumulative transfer function $\rho(x, \omega_i, \omega_o)$ which absorbs both direct and subsurface (multiple-bounce) contributions, suitable for both opaque and translucent materials.
Dynamic/Portrait Modeling extends the factorization to time-varying scenes and adopts neural-field representations anchored to deformable skeletons or mesh proxies (Chen et al., 2022, Sevastopolsky et al., 2020).
3D Gaussian Splatting with PRT (Zhang et al., 2024, Fan et al., 7 Mar 2025) learns per-Gaussian SH or light-conditioned colors for efficient relighting in explicit point-based models.

Lighting is typically parameterized via Spherical Harmonics (order 2–10 for practical trade-offs), learned basis functions, or, in wavelet-based approaches, via Haar (or other) wavelet decompositions to efficiently represent all-frequency lighting (Raghavan et al., 2023).

2. Neural Representations and Rendering Pipelines

The practical realization of free-viewpoint relighting depends on scene representation and the rendering algorithm:

Implicit volumetric fields (NeRF, NeuS, and their extensions) model density $\sigma(x)$ and appearance as functions of 3D position, view direction, and, in relighting extensions, light direction or SH coefficients. Volume rendering is performed along camera rays, integrating density-weighted colors. For relighting, lighting conditions are injected as additional MLP inputs, or through learned SH codebooks (Rudnev et al., 2021, Yu et al., 2023, Toschi et al., 2023).
Scene proxies such as MVS-reconstructed meshes (Philip et al., 2021), coarse point clouds (Sevastopolsky et al., 2020), or parametric facial bases (Wang et al., 2022), provide explicit geometry over which shading and appearance are defined. These enable precomputation and path-tracing of irradiance or mirror images, and function as the substrate for U-Net/CNN-based neural renderers.
3D Gaussian Splatting (Zhang et al., 2024, Fan et al., 7 Mar 2025, He et al., 2024) deploys explicit, efficient, rasterizable representations where Gaussians are projected to the image plane and their per-Gaussian colors are modulated via SH-based, or light-conditioned, neural decoders.
Hybrid explicit-implicit: Some methods learn a compact feature field (e.g., via tensor decomposition) supporting neural MLP lookup conditioned on features, BRDF parameters, and direction (Raghavan et al., 2023).

Rendering in most models is accomplished through volumetric ray-marching, NeRF-style alpha compositing, or analytic summation in the basis of SH or wavelets, supporting efficient real-time inference in many recent systems (Zhang et al., 2024, Raghavan et al., 2023).

3. Illumination Modeling and Control

Relighting requires precise description and manipulation of illumination:

Spherical Harmonics (typically 2nd–10th order): Used per-image or per-scene for parameterizing environmental lighting. Lower order (e.g., $L=2$ or $L=3$ ) is computationally efficient but captures only low-frequency lighting; higher order enables sharper shadows but increases compute and risk of overfitting (Rudnev et al., 2021, Zhang et al., 2024, Toschi et al., 2023).
Explicit Light Direction Conditioning: Many methods, especially those using OLAT (One-Light-at-a-Time) acquisition setups, inject light direction (and optionally, size for area lights) directly into the network alongside spatial coordinates (Sevastopolsky et al., 2020, He et al., 2024, Toschi et al., 2023).
Wavelet or Learned Basis: High-frequency lighting effects such as caustics and sharp glossy reflections are efficiently parameterized using Haar wavelet decompositions, with lighting and transport both projected into the same basis (Raghavan et al., 2023).
Precomputed Radiance Transfer (PRT): Compact per-point transfer coefficients (SH or wavelet) are learned and modulated at run-time by projecting any desired illumination (including HDR environmental maps) into the same basis; the dot product yields the outgoing radiance (Zhang et al., 2024, Raghavan et al., 2023).

Some pipelines allow users to edit SH coefficients interactively or supply entire HDR environment maps for arbitrary relighting (Rudnev et al., 2021, Zhang et al., 2024, Philip et al., 2021).

4. Training Protocols and Loss Functions

Training free-viewpoint relighting systems demands data that jointly covers view and lighting axes, network architectures able to disentangle intrinsic factors, and objectives stabilizing geometry and appearance regressions:

Acquisition Strategies:
- OLAT: Robotic setups or light stages sample dense combinations of viewpoints and known light positions (Toschi et al., 2023, Zhang et al., 2024, He et al., 2024).
- Environment Variation: Outdoor and uncontrolled indoor scenes require methods to learn SH coefficients directly from real photographs without explicit light annotations (Rudnev et al., 2021, Philip et al., 2021).
- Dynamic Capture: Temporal sequences (e.g., talking faces, moving humans) rely on a fixed base lighting for geometry and descriptors, while relighting is learned via supervised, self-supervised, or diffusion mapping from flat-lit or OLAT data (He et al., 2024, Chen et al., 2022).
Losses:
- Photometric loss (MSE/L1) on rendered outputs vs. ground truth.
- Perceptual metrics (e.g., VGG, LPIPS) to stabilize color/detail.
- Shadow regularizers and multi-channel disentanglement losses (e.g., pushing shadow maps to unity except where needed, supervising albedo/normals) (Rudnev et al., 2021, Sevastopolsky et al., 2020).
- Geometry/Temporal Consistency: Eikonal loss for SDF fields (Zeng et al., 2023), temporal coherence for video sequences (Cai et al., 2024), and mesh alignment or multi-view consistency for explicit representations (Wang et al., 2022).
- Adversarial or multi-scale perception terms in some U-Net based renderers for enhanced high-frequency details (Philip et al., 2021, Cai et al., 2024).
Ablations confirm that visibility modeling (e.g., neural vis-branches for shadows), careful light-direction encoding, and disentanglement of shadow/specularity cues are critical for accurate free-viewpoint relighting (Toschi et al., 2023, Zeng et al., 2023).

5. Scene and Material Generalization

A central challenge is relighting under material diversity (e.g., gloss, translucency) and scene complexity:

Opaque/Non-Lambertian Materials: Specular and glossy effects are handled by neural branches/feature fields tuned to mirror-like reflectances (e.g., via learned BRDF approximation or explicit reflective "mirror" maps) (Chen et al., 2019, Philip et al., 2021, Raghavan et al., 2023).
Translucency/Subsurface Scattering: OSF and similar models bypass direct Monte Carlo simulation by training $\rho(x, \omega_i, \omega_o)$ from data, allowing accurate reproduction of translucent soaps and plastically scattering volumes without nested integrals (Yu et al., 2023, Zeng et al., 2023).
Dynamic Subjects and Scene Composition: 4D neural fields anchored to deforming skeletons or point clouds enable relighting of human performances, with joint modeling of per-vertex (or per-point) geometry, normals, occlusions, and spatially-varying reflectance (Chen et al., 2022, Cai et al., 2024, Sevastopolsky et al., 2020).

Recent research demonstrates practical solutions spanning indoor, outdoor, object-centric, and full-body applications with domain-appropriate modifications in pipeline and architecture.

6. Quantitative Evaluation and Benchmarks

Performance is systematically evaluated using PSNR, SSIM, LPIPS, and task-specific metrics (e.g., Lighting Error, FID, temporal consistency):

Method/Paper	Domain	PSNR (dB)	SSIM	LPIPS	Frame Rate	Comment
NeRF-OSR (Rudnev et al., 2021)	Outdoor	18.7–19.9	0.47	—	—	High shadow realism, SH lighting
Relightable 3D Head Portraits (Sevastopolsky et al., 2020)	Faces	—	—	0.081–0.14	~30Hz	Best VGG/LPIPS vs. DPR
SunStage (Wang et al., 2022)	Faces (outdo.)	23–25	0.83–0.84	0.09–0.10	—	Outdoor, sunlight as light stage
ReLight My NeRF (Toschi et al., 2023)	Tabletop	25.8–26.1	0.61	—	—	OLAT, split RGB/vis branch
PRTGaussian (Zhang et al., 2024)	Objects	33.6	0.94	0.026	333Hz	Real-time, SH-PRT, 3DGS
OSF (Yu et al., 2023)	Translucent	39.1	0.97–0.98	0.006	0.27–16Hz	Opaque/translucent, scene composition

Where frame rate is reported, real-time (>30 Hz) is now accessible with explicit or tensor/hashing-based representations paired with lightweight neural decoders (Zhang et al., 2024, Raghavan et al., 2023, He et al., 2024, Fan et al., 7 Mar 2025). Generalization to novel view/light, sharp shadow and specular reproduction, and support for difficult materials are typically showcased in qualitative studies.

7. Current Challenges and Future Directions

Despite rapid advancement, open problems remain:

Material/Lighting Generalization: Environment SH truncation and wavelet representation trade-off sharp shadows for computational efficiency; reconstructing all-frequency lighting and high-gloss mirrors remains difficult (Raghavan et al., 2023).
Data Requirements: Supervised outdoor or uncontrolled indoor relighting is fundamentally limited by the range of observed lighting; unlabeled or in-the-wild methods require robust regularizers and priors (Rudnev et al., 2021, Philip et al., 2021).
Dynamic/Interactive Scenes: Efficient updating, consistent temporal appearance, and compositional modeling for scenes with interacting, moving, or deformable actors remain areas of active research (He et al., 2024, Chen et al., 2022).
Efficiency and Scalability: Further compression via hashing, factorization, or local model distillation (e.g., KiloOSF) is key for large/complex scenes and mobile/edge deployment (Yu et al., 2023, Zhang et al., 2024).
Viewpath and Lighting Coverage: Empirical recommendations for multi-orbit camera/light coverage in object setup, and turntable-based capture standardize scene acquisition for high-quality relighting (Fan et al., 7 Mar 2025).

A plausible implication is that hybrid approaches combining explicit and implicit representations, high-order basis expansions, and data-driven shadow/specular encoding will further close the realism gap under both novel lighting and viewpoint, even in uncontrolled and dynamic scenarios. Continued standardization of benchmarks (e.g., ReLight My NeRF) and data acquisition pipelines will facilitate more rigorous comparison and drive methodological convergence.