Generative Relighting Methods
- Generative relighting is a class of learning-based methods that synthesize images or radiance fields under modified lighting while preserving scene structure.
- It employs diverse approaches—from GANs and latent-space disentanglement to physics-based and diffusion models—for controlled global, local, and text-guided illumination edits.
- Recent advances leverage robust datasets and innovative latent representations, yet challenges remain in handling complex material effects and ensuring multi-view consistency.
Generative relighting encompasses a class of algorithms and neural architectures that synthesize images or radiance fields representing a given scene under modified lighting conditions. Unlike classic inverse rendering, which seeks explicit scene decompositions (geometry, albedo, normals, light), generative relighting applies learning-based, often implicit methods to predict plausible appearance changes corresponding to arbitrary or controlled illumination edits. These advances support applications in image editing, photorealistic synthesis, data augmentation, 3D reconstruction, and video post-production.
1. Core Principles and Challenges
Generative relighting aims to manipulate illumination in existing images or 3D representations while preserving scene structure, material appearance, and photorealistic fidelity. The fundamental challenge lies in the ill-posed mapping from observed intensities to plausible output under new lighting, given that geometry, materials, and occlusions are not directly observable from a single or few images.
A critical distinction arises between methods that:
- Explicitly estimate physical scene parameters (normals, BRDFs, lighting) and resynthesize with graphics engines or shading modules.
- Learn implicit, data-driven mappings (e.g., via GANs, diffusion models, or neural radiance fields) that capture appearance variation from data, often bypassing explicit inverse rendering, and instead representing effects such as shadows and specularities through latent neural mechanisms.
Traditionally, global relighting assumes off-image or infinitely distant lights, while recent developments target localized or per-light-source control and text-driven lighting edits.
2. Methodological Advances
2.1 Classical GAN and Image-to-Image Translation Approaches
Early work frames relighting as a conditional image-to-image translation task. For example, (Gafton et al., 2020) employs the pix2pix conditional GAN framework, using a U-Net generator and patch-based discriminator to learn mappings from input images with known light-source direction to target direction, with losses balancing L1 (for global structure) and adversarial objectives (for high-frequency detail). Multiple dedicated networks are trained for each target light direction, and auxiliary CNNs are used to classify source direction in input images to select the appropriate relighter.
2.2 Latent-Space Disentanglement and Overcomplete Representations
Subsequent methods move towards disentangling subject appearance from illumination as latent codes. Notably, (Song et al., 2021) introduces overcomplete lighting representations (e.g., OT3) to anchor interpolation in latent lighting spaces and proposes a multiplicative neural renderer that more faithfully models image formation as the product of subject and illumination latents, facilitating realistic lighting interpolation and continuous relightings.
2.3 Physics-Based and Volumetric Neural Rendering
Recent models, particularly in 3D or face applications, blend implicit learning with explicit physics-inspired mechanisms. (Tan et al., 2022) (VoLux-GAN) accumulates albedo, diffuse, and specular light contributions volumetrically along rays, conditioning on any HDRI environment map. Spherical harmonics and learned transfer coefficients model light transport, enabling scene relighting under diverse, high-dynamic-range illuminations while accurately preserving and transferring geometry, albedo, and material properties across image generations ((Deng et al., 2023), LumiGAN).
2.4 Diffusion Models and Score Distillation
Diffusion priors, including both 2D and video diffusion models, have become prominent. Models such as Neural Gaffer (Jin et al., 11 Jun 2024) and Lasagna (Bashkirova et al., 2023) fine-tune large-scale diffusion models for relighting and learn to interpret environment maps or textual prompts as conditioning signals. Score distillation sampling (SDS) approaches transfer the prior of a diffusion model to an explicit editing layer (Lasagna), focusing manipulations on layers that affect only lighting, thus preserving content invariances.
Diffusion models are trained to minimize noise prediction errors in latent space, conditioned on original images, environment/scribble/albedo maps, and, where relevant, material parameters or view/camera/ray information. Multi-view attention and consistent latent sharing (LightSwitch (Litman et al., 8 Aug 2025), A Diffusion Approach to Radiance Field Relighting (Poirier-Ginter et al., 13 Sep 2024), ROGR (Tang et al., 3 Oct 2025)) further enable consistent relighting across camera viewpoints, facilitating 3D understanding and reconstruction.
2.5 Intrinsic Decomposition and Physical Consistency
A distinct class of methods (Yang et al., 27 Sep 2024) employs intrinsic image factorization, estimating reflectance and shading maps and predicting new shading under explicit lighting control. This decompositional approach, combined with synthetic and real intrinsic-labeled datasets, ensures intermediate physical consistency, enabling explicit handling of arbitrary light positions and colors.
3. Application Domains
Generative relighting methods support diverse applications:
- Image and Portrait Editing: Fine control over lighting, shadows, and moods for post-production and stylistic edits in portraiture and digital photography (Song et al., 2021, Cha et al., 18 Dec 2024).
- 3D Asset and Face Synthesis: Generation of relightable 3D assets for animation, VR/AR avatars, and digital humans, using physically informed decomposition and GAN frameworks (Tan et al., 2022, Deng et al., 2023, Rao et al., 15 Jul 2024).
- Scene and Object Insertion: Pipeline integration into neural radiance fields (NeRFs) for consistent object insertion and relighting in complex scenes, allowing mutual cast shadows and physically coupled illumination (Zhu et al., 21 Jun 2024, Tang et al., 3 Oct 2025).
- Data Augmentation and Authoring: Non-synthetic, physics-aware data augmentation for downstream learning, including classification and self-supervised representation learning (Forsyth et al., 2021).
- Video Relighting: Temporal consistency and harmonization in video relighting and background replacement through video generative models and domain-adaptive training (Zeng et al., 18 Aug 2025).
4. Dataset Design and Benchmarking
Recent progress owes much to the development of large-scale synthetic and real datasets:
- VIDIT (Helou et al., 2020) provides multipose, multitemperature illumination for controlled training and benchmarking (Gafton et al., 2020).
- Synthetic datasets, such as those with thousands of HDRI-lit 3D object renderings and full intrinsic component supervision (Yang et al., 27 Sep 2024), support models in learning disentangled representations.
- Real datasets, e.g., Lonoff (Cui et al., 2022) and RSR (Yang et al., 27 Sep 2024), capture fine-grained light source toggling and varied material/shadow complexity, supplying demanding benchmarks for local and global relighting performance.
Benchmarking employs a range of photometric and perceptual metrics (PSNR, SSIM, LPIPS, MPS, FID, and local-FID) as well as user studies to validate qualitative plausibility and realism.
5. Extensions: Local, Text-Guided, and Interactive Control
Recent works advance beyond global or direction-only relighting to support:
- Local Control: Techniques to toggle individual light sources and spatial regions (switching lamps on/off, window light, etc.), using modular architectures and style-based modulation in GANs (Cui et al., 2022).
- Text-Guided Relighting: Pipelines that synthesize lighting and mood from natural language prompts, employing structured text-to-image/lighting synthesis and controlled diffusion (Cha et al., 18 Dec 2024, Bashkirova et al., 2023).
- Interactive Editing: User-in-the-loop relighting via scribbles or control maps that refine local lighting effects while using explicit albedo or normal guidance for geometry-aware edit propagation (Choi et al., 26 Nov 2024).
6. Limitations and Future Directions
Despite significant progress, challenges remain:
- Handling of complex, non-Lambertian phenomena (such as subsurface scattering, near-field lighting, volumetric effects) is limited in systems trained predominantly on diffuse/specular renderings.
- Multi-view consistency is non-trivial for diffusion-based relighting; recent advances leverage joint denoising and attention/classifier-free guidance, but further robustness is required for extreme viewpoint and material variation (Alzayer et al., 19 Dec 2024, Litman et al., 8 Aug 2025, Tang et al., 3 Oct 2025).
- Real-world deployment is currently constrained by the need for high-quality, large-scale data, and by the computational requirements of state-of-the-art diffusion models or 3D radiance field optimization.
- Extensions to fully dynamic video relighting, temporally coherent bidirectional manipulations, and broader editability in material, lighting context, and global illumination are promising directions.
A plausible implication is that future generative relighting systems will integrate richer physical priors, self-supervised learning from unlabeled data, explicit material guidance, and flexible user control, furnishing a unified toolkit for image, 3D, and video editing across a diverse array of scientific and industrial fields.