Multi-View Lighting Harmonization
- Multi-view lighting harmonization is the process of aligning illumination across images using explicit and implicit 3D representations to enforce photometric consistency.
- It utilizes techniques like 3D Gaussian splatting and multi-view diffusion models to address issues such as view-dependent lighting and exposure mismatches.
- Advancements in these methods support high-fidelity 3D reconstruction, novel-view synthesis, and interactive scene editing with real-time, scalable computation.
Multi-view lighting harmonization is the process of enforcing photometric consistency or plausible relighting across images or synthesized views of a 3D scene captured under varying, uncontrolled, or user-modified illumination. This functionality is foundational for high-fidelity 3D reconstruction, novel-view synthesis (NVS), object compositing, and interactive scene editing. Recent advances leverage explicit and implicit scene representations, multi-view generative models, material inference, and view-conditioned networks to achieve state-of-the-art harmonization quality. Methods target challenges including view-dependent illumination, device/camera exposure mismatches, spatially-varying light transport, and the demand for real-time or scalable computation.
1. Foundations and Problem Definition
Multi-view lighting harmonization seeks to resolve photometric inconsistencies in a set of images (typically with known poses) arising from non-uniform scene illumination, exposure differences, or temporal changes during capture. The objective is to generate a coherent, globally-consistent representation suitable for 3D rendering (e.g., NeRF, Gaussian Splatting), relighting, compositing, or interactive editing. This problem extends beyond classical photometric calibration due to complex, spatially-variant lighting, scene shadowing, and the entanglement of view-dependent effects with scene geometry and materials.
Broadly, harmonization can be categorized into:
- Intra-scene harmonization: Mapping all captured views to a consistent (e.g., canonical or user-specified) lighting domain.
- Decomposition-based harmonization: Separating per-light source contributions to allow independent manipulation and re-synthesis.
- Feature-based harmonization: Enforcing consistency directly in explicit (e.g., 3D Gaussians, mesh) or implicit (e.g., neural fields, multi-view diffusion) scene descriptors.
2. Explicit and Implicit Scene Representations
Modern harmonization frameworks operate predominantly on two sets of scene representations:
- Explicit: 3D proxy geometry reconstructed via multi-view stereo (MVS), point clouds, or recent explicit primitives (such as 3D Gaussians). These representations enable spatially-resolved reprojection, light transport simulation, and view synthesis pipelines rooted in graphics.
- Example: In "Free-viewpoint Indoor Neural Relighting from Multi-view Stereo," the pipeline exploits MVS meshes to estimate per-pixel irradiance and albedo, enabling both diffuse-specular decomposition and targeted relighting, with neural networks fusing feature buffers across view-projected composites (Philip et al., 2021).
- 3D Gaussian Splatting has emerged as a robust explicit basis, with each Gaussian encoding location, covariance, color, and optionally lighting features. Techniques such as Luminance-GS and MV-CoLight rely on such Gaussians for joint harmonization and relighting (Cui et al., 2 Apr 2025, Ren et al., 27 May 2025).
- Implicit/Generative: Multi-view diffusion models, VAEs, or neural radiance fields (NeRFs) parameterize the scene as a set of learnable latent functions, enabling end-to-end training for harmonization, relighting, and geometric inference.
- Example: SimVS leverages a multi-view latent diffusion model, with harmonization achieved through diffusion regression in VAE feature space and cross-attention over reference and inconsistent lighting views (Trevithick et al., 2024).
| Representation | Example Papers | Key Components |
|---|---|---|
| 3D Gaussian Splatting | (Cui et al., 2 Apr 2025, Ren et al., 27 May 2025, Liang et al., 21 Jan 2026) | Gaussians with spatial, color, and lighting coefficients |
| Multi-view Diffusion | (Trevithick et al., 2024, Shim et al., 2024, Litman et al., 8 Aug 2025) | Latent UNet, cross-view attention, lighting encoding |
| MVS Mesh (Hybrid) | (Philip et al., 2021) | Proxy mesh, feature buffers |
3. Model Architectures and Harmonization Pipelines
A diversity of model architectures implement harmonization, each optimizing for fidelity, scalability, or editability.
3.1 2D-Only Methods
Early strategies targeted per-image or pairwise harmonization via color mapping, exposure correction, or tone curve normalization. However, such approaches are inadequate for geometric consistency and shadow alignment.
3.2 3D- and View-Adaptive Pipelines
- Luminance-GS (Cui et al., 2 Apr 2025): Incorporates per-view 3×3 color matrices and global + view-bias tone curves (parametric via power-law/S-curve priors) to remap colors outside the 3DGS pipeline. The model jointly learns these remapping modules and 3DGS colors under strong regularization, optimizing both geometric and photometric losses for reference and enhanced view targets.
- MV-CoLight (Ren et al., 27 May 2025): Employs a two-stage pipeline: a per-view 2D harmonizer using a Swin-transformer to re-illuminate composites, followed by a 3D Gaussian-based compositor leveraging a Hilbert curve mapping to align 2D features and 3D Gaussians, achieving strict color and shadow consistency.
- LuxRemix (Liang et al., 21 Jan 2026): Decomposes illumination into per-light OLAT and ambient passes using pretrained diffusion-transformer (DiT) models with LoRA adapters for masked editing. Multi-view harmonization propagates these decompositions with a multi-view diffusion-U-Net, and relighting is achieved by augmenting each Gaussian with per-light HDR color coefficients, facilitating interactive re-illumination.
- SimVS (Trevithick et al., 2024): Simulates diverse lighting changes via generative video diffusion, generating large-scale synthetic data with realistic world inconsistencies for supervised harmonization network training. A multi-view latent diffusion U-Net with reference and inconsistent input streams outputs all reference-lit views.
- MVLight (Shim et al., 2024): Implements a light-conditioned multi-view diffusion prior, with explicit per-batch HDR light embeddings injected at every U-Net block. Score-distillation sampling (SDS) supervises 3D NeRF fields in normal, albedo, and color modalities, ensuring joint geometric and illumination fidelity suitable for prompt-driven 3D synthesis and relighting.
- LightSwitch (Litman et al., 8 Aug 2025): Utilizes a finetuned U-Net latent diffusion model with material-per-view conditioning and scalable cross-view self-attention, allowing view-consistent relighting to arbitrary target illumination at scale. Scalability is achieved via batched stochastic reshuffling at each diffusion step.
4. Loss Functions, Consistency Mechanisms, and Training Objectives
Multi-view harmonization models employ a hierarchy of objectives:
- Photometric/Structural Losses: DSSIM, LPIPS, PSNR, and standard L1/L2 between synthesized and target images.
- Curve and Color Regularization: Luminance-GS applies curve-regularization to enforce smoothness and histogram alignment for per-view tone curves, as well as a total-variation penalty on discrete mappings (Cui et al., 2 Apr 2025).
- Spatial/Temporal Consistency: Smoothness penalties (e.g., Zero-DCE-inspired), explicit multi-view stability loss terms (e.g., consistency across O_diffuse in (Philip et al., 2021)), and KNN-based Gaussian smoothing for per-light coefficients (Liang et al., 21 Jan 2026).
- Multi-view Attention/Batching: Material- and view-conditioned self-attention layers permit direct information flow across all image tokens/views (LightSwitch, MVLight). Stochastic reshuffling and minibatching (LightSwitch) address quadratic scaling with many views.
- Relighting and Decomposition Objectives: LuxRemix jointly optimizes OLAT fidelity, composition consistency, and spatial smoothness on per-Gaussian light coefficients, coupled with differentiable tone-mapping (Liang et al., 21 Jan 2026).
- Score-based Supervision: MVLight leverages lighting-aware SDS, supervising the rendered modalities with gradients from the light-conditioned diffusion prior to align rendered and denoised latent statistics (Shim et al., 2024).
5. Datasets, Benchmarks, and Quantitative Evaluation
- Synthetic Multi-light/Multi-view Datasets: Many recent systems (LuxRemix, MV-CoLight) rely on large-scale synthetic benchmarks, with thousands of procedurally-generated scenes under controlled light layouts, enabling robust benchmarking and generalization (Liang et al., 21 Jan 2026, Ren et al., 27 May 2025).
- Real-World Casual Captures: Evaluation on unconstrained cell-phone panoramas, real estate tours, and in-the-wild multi-view sequences, often requiring accurate pose estimation (e.g., COLMAP).
- Metrics: PSNR, SSIM, LPIPS (lower is better), and, for text-conditional pipelines, CLIP-score and user studies.
- Performance: LightSwitch achieves PSNR=26.01, SSIM=0.888, LPIPS=0.216 on BlenderVault (Litman et al., 8 Aug 2025); Luminance-GS attains PSNR≈20.98, SSIM≈0.707, LPIPS≈0.357 (reference harmonization) (Cui et al., 2 Apr 2025, Trevithick et al., 2024); LuxRemix delivers PSNR/SSIM/LPIPS of 30.76/0.867/0.0907 for multi-view harmonization (Liang et al., 21 Jan 2026). MV-CoLight outperforms prior compositing baselines with PSNR=30.29, SSIM=0.960, LPIPS=0.030 at ~1 s per full scene (Ren et al., 27 May 2025).
6. Interactive Relighting, Compositing, and Editing
A defining feature of advanced harmonization frameworks is interactive control and editing support:
- Per-light Manipulation: LuxRemix enables individual lights to be turned on/off, chromatically adjusted, or scaled in intensity interactively by editing per-light Gaussian coefficients.
- Fast Inference: Luminance-GS enables real-time (~150 FPS) rendering under harmonized colors, and LuxRemix achieves GPU-bound updates at 20–30 FPS for full 3D relighting (Cui et al., 2 Apr 2025, Liang et al., 21 Jan 2026).
- Object Compositing: MV-CoLight introduces a compositional pipeline for harmonizing inserted objects via 2D harmonization followed by 3D color-space fusion, ensuring that shadows and highlights remain to scene-consistent as the camera moves (Ren et al., 27 May 2025).
- Generative Scene Synthesis: MVLight integrates relightable text-to-3D synthesis, where editability is achieved by leveraging the explicit decoupling of albedo, normals, and lighting during generation (Shim et al., 2024).
7. Limitations and Prospects
- Geometry/MVS Artifacts: Approaches reliant on explicit proxy geometry or MVS (e.g., (Philip et al., 2021)) are limited by mesh quality, failing on thin, reflective, or non-Lambertian surfaces.
- Synthetic-to-Real Domain Gaps: Systems extensively trained on synthetic data (MV-CoLight) may exhibit color bias or miss fine real-world detail (Ren et al., 27 May 2025).
- Physical Light Source Modeling: Most frameworks do not infer explicit source locations or precise photometric lighting parameters, instead enforcing consistency through appearance-based decomposition; future directions include explicit source estimation and visible/invisible light differentiation.
- Scaling and Coverage: Scaling fully-diffusion-based approaches to hundreds of images is an active area, partially addressed by batching/shuffling mechanisms in LightSwitch (Litman et al., 8 Aug 2025).
- Material and BRDF Limitations: While modern methods integrate inferred material cues (e.g., LightSwitch, MVLight), highly anisotropic or transparent BRDFs remain challenging. More flexible and physically-grounded reflectance models may be needed for complex scenes.
References
- "Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment" (Cui et al., 2 Apr 2025)
- "LuxRemix: Lighting Decomposition and Remixing for Indoor Scenes" (Liang et al., 21 Jan 2026)
- "LightSwitch: Multi-view Relighting with Material-guided Diffusion" (Litman et al., 8 Aug 2025)
- "SimVS: Simulating World Inconsistencies for Robust View Synthesis" (Trevithick et al., 2024)
- "Free-viewpoint Indoor Neural Relighting from Multi-view Stereo" (Philip et al., 2021)
- "MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation" (Ren et al., 27 May 2025)
- "MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion" (Shim et al., 2024)