MPLI: Multi-Plane Light Image
- MPLI is a depth-aligned stack of RGB light images that captures photometrically accurate 3D illumination by encoding fixed-depth lighting attributes.
- It uses a physics-based formulation integrating point light intensity, color, and inverse-square law falloff, enabling dynamic multi-source illumination modeling.
- The approach supports advanced applications in neural video relighting and optical multiplexing by providing precise, temporally adaptive lighting control.
A Multi-Plane Light Image (MPLI) is a layered, depth-aligned visual representation for encoding photometrically accurate illumination attributes of a 3D scene. Unlike classical Multi-Plane Images (MPIs), which store view-dependent scene appearance (RGBA) for view synthesis, MPLIs are defined as stacks of per-plane RGB “Light Images” sampled at fixed depths, capturing 3D lighting (source positions, intensities, chromaticities) as projected irradiance distributions. Originally introduced in the context of fine-grained video relighting for generative models, the MPLI provides a physics-inspired, compact visual prompt for deep neural networks, supports arbitrary multi-source illumination, and enables temporally varying lighting control while efficiently generalizing to unseen lighting arrangements (Bian et al., 9 Nov 2025).
1. Definition and Mathematical Formulation
An MPLI consists of fronto-parallel “Light Images” , each situated at a fixed camera-aligned depth within the scene frustum. Each encodes the total irradiance at depth , combining all sources:
where:
- : number of point light sources
- : source 3D position
- : scalar intensity
- : RGB color
- : 3D position of the pixel in plane
- : global normalizers ensuring numerical compatibility with downstream models
The full MPLI is the ordered set . The representation naturally supports multi-source superposition and temporal dynamism by providing distinct MPLIs for consecutive temporal blocks.
2. Encoding Illumination and Scene Geometry
Each per-plane image encodes physical illumination by integrating source-specific falloff (inverse-square law) and color mixing at sampled depths. The position of each light influences the per-pixel irradiance profile via the squared distance in world coordinates, while intensity and color modulate the magnitude and chromatic bias, respectively. The additive structure allows linear encoding of complex, multi-source lighting fields, and by updating MPLIs per time block, the method accommodates dynamic, time-varying sources.
3. Relation to Multi-Plane Image (MPI) and Other Layered Representations
MPIs encode scene appearance as RGBA textures at fixed depths, composed via back-to-front over-operator for view interpolation and novel-view rendering (Srinivasan et al., 2019, Solovev et al., 2022, Völker et al., 2020, Han et al., 2022). Each MPI plane typically contains color and alpha (opacity), and is rendered by warping layers to the target view and compositing:
MPLI planes, in contrast, have only RGB irradiance channels and do not encode occlusion, instead capturing illumination distributions that are not view-dependent images but “lighting maps” independent of camera pose or occlusion. MPLIs can be injected as control signals for relighting, rather than layered for rendering per se. In (Bian et al., 9 Nov 2025), the MPLI, after compression (e.g., by a Video VAE), is injected as a latent visual cue into a frozen Video Diffusion Transformer, enabling direct illumination control.
4. Implementation in Generative Video Relighting
The "RelightMaster" framework leverages MPLIs for high-fidelity, controllable video relighting (Bian et al., 9 Nov 2025). Key workflow:
- Four Light Images per four-frame temporal block ()
- Each set compressed into a latent vector using a Video VAE
- A Light Image Adapter (LIA), reusing backbone video transformer weights, transforms into tokens injected into each block of a frozen Video Diffusion Transformer (DiT) alongside video latents
- Only the LIA weights, 3D-attention, and a small LoRA adapter are fine-tuned. The DiT’s generative prior is preserved, avoiding catastrophic forgetting
Empirical results confirm control over illumination position, intensity, and color: scenes relit via MPLI-encoded prompts exhibit faithful highlight, shadow, and color reproduction, outperforming text-only visual prompt approaches in physical consistency and control granularity.
5. Multi-Source, Generalization, and Temporal Dynamics
The summation in the irradiance formula admits arbitrary numbers of point light sources, as each source’s contribution is independent and linearly combined. Despite a focus on single-source training data, the linear superposition ensures compositional generalization to multi-source scenarios and previously unseen source arrangements in position, intensity, and color. Temporally adaptive lighting is realized by feeding temporal sequences of MPLIs, one per time block, enabling smooth cross-fades, dynamic lighting transitions, and moving shadows.
6. Broader Context: Layered Light Representations and Optical MPLI
While the deep learning MPLI is focused on differentiable, learnable relighting, analogous concepts appear in computational and physical optics:
- In broadband diffractive optics, “multi-plane light-image” (MPLI) projection uses a single engineered phase mask to project distinct images at multiple spatial planes (and/or spectral bands), optimizing the mask profile to direct desired intensity distributions to target depths and colors (Meem et al., 2019). The underlying principle is joint spatio-spectral manipulation by the mask’s chromatic-dispersive and depth-propagation properties via the Fresnel integral. Fabricated devices can simultaneously project overt and covert (e.g., IR) images at different depths, with multi-plane efficiency up to 65%.
- In computational imaging, “deep learned optical multiplexing” techniques encode multi-depth (focal) information into single capture images, with neural decoding to recover multi-plane intensity stacks, an approach functionally parallel to software-based MPLI in compressing volumetric information into succinct observables and then reconstructing the stack by learned inversion (Cheng et al., 2019).
7. Applications, Limitations, and Future Directions
Applications
- Fine-grained neural video relighting: Scene-level, physically plausible, temporally coherent lighting edits with fine spatial control, as demonstrated in RelightMaster (Bian et al., 9 Nov 2025).
- Generalization to unseen lighting: Admits complex, dynamic, multi-source illumination fields, enabling a level of relighting control not feasible with text-only prompts.
- Physical-optical multiplexing: Used in anti-counterfeiting, holographic display, spectral/axial image multiplexing, and flat optical device engineering (Meem et al., 2019).
- Live microscopy and computational imaging: Single-shot multi-plane acquisition with neural recovery for high-speed focal stack imaging (Cheng et al., 2019).
Limitations
- MPLI as proposed in (Bian et al., 9 Nov 2025) supposes explicit, algebraically defined point lighting fields. Extension to arbitrary indirect illumination, non-point light models, or full global illumination scenarios can be challenging.
- No occlusion or volumetric scattering is modeled in the canonical MPLI; these effects must be handled elsewhere in the pipeline.
- The utility in general 3D view synthesis (not just relighting) is less established than the classical MPI’s role in novel-view rendering.
Future Directions
- Extension of MPLI to encode indirect lighting and global illumination effects by parameterizing more complex radiance transfer kernels.
- Physical implementation of reconfigurable, dynamic MPLI in nanophotonics or metasurface optics for real-time light field and relighting control.
- Fusion of MPLI with adaptive, learned geometry deformation as in SIMPLI (Solovev et al., 2022), or with dynamic temporal basis encoding as in temporal-MPI (Xing et al., 2021), to yield more compact, expressive, and physically consistent layered scene-light representations.
MPLI, as an explicit, depth-aligned illumination encoding, offers a bridge between physically informed light transport and neural rendering pipelines, supporting precise, compositional, and temporally rich control over scene relighting and novel illumination synthesis (Bian et al., 9 Nov 2025).