Multi-plane Light Imaging (MPLI)

Updated 16 November 2025

Multi-plane Light Imaging (MPLI) is a layered representation that encodes scene and lighting information as a stack of depth-aligned planes for versatile rendering and relighting.
It employs both physical (diffractive optics) and computational (neural rendering) methods using homography warping, alpha compositing, and adaptive plane interactions.
Applications span AR, holographic imaging, microscopy, and video relighting, demonstrating efficient, real-time synthesis with low crosstalk and high imaging fidelity.

A Multi-plane Light Image (MPLI) generalizes classic multi-layer visual representations by encoding scene or lighting information as a stack of planes aligned along depth. MPLI variants have been implemented across diffractive optics, scene relighting, computational photography, and light field synthesis. The unifying principle is that each “plane” encodes a 2D spatial function (irradiance, color, density, mask, etc.) at its associated depth, with application-specific rendering, inversion, or compositing methods. The structural basis and modern mathematical formalism of MPLI enable precise multi-depth, multi-band, and multi-directional control at both the hardware and software levels.

1. Mathematical Foundations of Multi-plane Light Imaging

MPLI representations instantiate the scene or light field as $K$ planes at increasing depths $\{d_k\}_{k=1}^K$ , each carrying a spatial map $F_k(x, y)$ . Depending on context, $F_k$ may refer to color (MPI), opacity/density, physical irradiance (as in relighting), or features. Rendering a view from an MPLI proceeds via homography-based warping and alpha compositing:

$I(p') = \sum_{k=1}^K \left[ \prod_{j=1}^{k-1}(1-\alpha_j(p'_{k}))\, \alpha_k(p'_k)\,C_k(p'_k) \right], \quad p'_k = H_k^{-1}p'$

where $H_k$ is the homography associated with plane $k$ , and $C_k$ and $\alpha_k$ are the color and opacity functions for that plane. For physical light modeling, such as in RelightMaster (Bian et al., 9 Nov 2025), $F_k$ encodes per-pixel irradiance:

$L_k(x, y) = \sum_{i=1}^M \frac{I_i\mathbf{c}_i}{\|\mathbf{q}_k(x, y) - \mathbf{p}_i\|^2/s_1 + s_2}$

with $\mathbf{p}_i$ (3D position), $I_i$ (intensity), and $\mathbf{c}_i$ (color) for light $i$ , and $\mathbf{q}_k(x, y)$ the 3D location on plane $k$ .

Computationally, rendering or propagating light from an MPLI often requires layerwise transforms (e.g., Fresnel integral for optics, volume rendering for visual planes), incorporating wavelength, viewpoint, or lighting parameterization as needed.

2. Physical and Algorithmic Realizations

Optics: Broadband Diffractive Optical Elements

Physically-encoded MPLI uses a multi-level phase profile $h(x, y)$ patterned (e.g., via grayscale lithography) onto a substrate to produce a Broadband Diffractive Optical Element (BDOE) (Meem et al., 2019). The output at each depth plane $z_i$ is computed via the Fresnel diffraction integral:

$U(x', y'; z_i) = \frac{e^{ikz_i}}{i\lambda z_i} \iint U_0(x, y) \exp\left[ \frac{ik}{2z_i} \left((x'-x)^2 + (y'-y)^2 \right) \right] dx\,dy$

Here, $U_0(x, y)$ encodes the BDOE's complex field modulation, and $k=2\pi/\lambda$ . The topographic profile $h(x, y)$ is optimized (using direct binary search or gradient methods) to generate prescribed images at multiple $z_i$ and/or wavelengths. Key fabrication parameters include:

Pixel pitch: 10–20 μm
Height levels: up to 100 (max height ∼2.6 μm)
Transmission/reflection: >96% (VIS), ∼85% (NIR)
Crosstalk (SSIM): <0.1 (spectral), 0.17–0.55 (multi-plane)

Computational: Neural Volume and Image Synthesis

Software-based MPLI (a superset including classical MPI) leverages deep convolutional or transformer architectures to build and refine rich layered scene representations:

Network predicts per-plane features: color, density, irradiance, or attention masks
Adaptive depth positioning and inter-plane interactions (e.g., via self-attention and masking (Han et al., 2022))
Differentiable volume rendering or compositing for novel view/relighting synthesis
Support for real and synthetic data, arbitrary viewpoint, and even time/lighting variation (Temporal-MPI (Xing et al., 2021), RelightMaster (Bian et al., 9 Nov 2025))

In some formulations, the MPLI encodes lighting cues as a stack of 2D images with precise registration to the content stream, permitting plug-and-play “visual prompts” for advanced diffusion models.

3. Representative Implementations and Applications

System / Context	Plane Content	Use Case
BDOE (Meem et al., 2019)	Phase relief (height)	Volumetric/spectral projection
RelightMaster (Bian et al., 9 Nov 2025)	Plane irradiance	Video relighting
Temporal-MPI (Xing et al., 2021)	Basis MPIs, per-frame	Dynamic scene synthesis
Cross-MPI (Zhou et al., 2020)	Plane-aware attention	Super-resolution stereo
MMPI (He et al., 2023)	Multiple MPIs + blending	360°, robust NeRF
Learned optical multiplexing (Cheng et al., 2019)	Multiplexed intensity	Multi-focal plane microscopy

Applications of MPLI span:

Volumetric AR and depth-layered displays (physical, computational)
Efficient, crosstalk-minimized holographic and security imaging (multi-band/multi-plane BDOEs)
Video relighting with explicit spatio-temporal light prompts (RelightMaster, DiT adaptation)
High-speed, high-fidelity multi-focal plane microscopy (deep learned LED multiplexing)
Unbounded scene and long-trajectory radiance field compression (MMPI)
Realistic optical flow, segmentation, and cross-modal synthesis tasks (MPI-Flow, Cross-MPI)

4. Design, Optimization, and Training Strategies

Physical MPLI design involves:

Jointly optimizing phase profiles or multiplexing weights to match target patterns at multiple depths and/or wavelengths (DBS, iterative gradient methods)
Manufacturing constraints: avoidance of subwavelength features, discrete height quantization, index-matched coatings for flat optics, tolerance analysis for fabrication errors (standard deviation ≲65 nm yields <20% efficiency loss up to σ≈100 nm (Meem et al., 2019))

Computational MPLI construction employs:

Architectural innovations for plane- and layer-wise operations (3D CNNs, self-attention, residual masking)
Adaptive or data-driven allocation of depth positions and plane content (scene-specific adjustment modules, learned attention masks)
Plane- or basis-parameterized lighting or appearance bases for temporal, lighting, or view consistency (Temporal-MPI, MPIs with lighting coefficients)
Data generation: synthetic “warp-back” pipelines for in-the-wild view synthesis, or closed-loop simulation/data for physical hardware (microscopy, optics)

Optimization targets include direct reconstruction loss, perceptual or frequency-consistency, crosstalk minimization, and efficiency metrics. In physical MPLI, figures of merit—such as imaging efficiency, transmission, and structural-similarity—quantify optical performance (Meem et al., 2019). In learned MPLI, PSNR, SSIM, LPIPS, or custom rendering losses are used (Bian et al., 9 Nov 2025, Han et al., 2022).

5. Performance Characteristics and Limitations

Physical MPLI devices (multi-band BDOEs) achieve:

Measured imaging efficiency: 54% (multi-plane, 400–700 nm, experimental), 60–70% (single-plane)
Absolute throughput: >96% (visible), ~85% (NIR)
Multi-plane crosstalk: SSIM <0.2 (sim), up to 0.55 (exp.; minimized by smaller pitches, larger plane separation)
Scalable, mass-producible, and robust to fabrication error

Learned/algorithmic MPLI systems achieve:

Real-time or tens-of-milliseconds-level synthesis with compact model sizes (Temporal-MPI: 0.008 s/frame, 0.48 GB vs. 5.4 GB for 3DMaskVol21 (Xing et al., 2021))
State-of-the-art or superior visual quality on multiple benchmarks (e.g., PSNR 34.2, SSIM 0.966 on Ken Burns (Han et al., 2022))
Near-seamless generalization to arbitrary lighting, time, or view distributions (RelightMaster multi-source, temporal control (Bian et al., 9 Nov 2025))
Limitations: explicit view-dependent effects still require extensions (spherical harmonic radiance, learned per-layer lighting), thin/sloped geometries may need more layers or adaptive merging (Solovev et al., 2022)

6. Extensions, Generalizations, and Research Outlook

Recent research broadens MPLI beyond fixed-RGBA stacking:

Spherical-harmonic or per-plane lighting profiles for fine relighting and photorealism
Adaptive, attention-driven depth allocation for efficient plane utilization (Han et al., 2022)
Multiple, directionally-oriented MPIs (MMPI) and adaptive per-voxel blending for robust 360° or long-range scenes (He et al., 2023)
Physical extensions: multi-level, multi-band, and reflective elements, as well as programmable optical circuit implementations (e.g., multi-plane light conversion for spatial mode sorting (Kupianskyi et al., 2022))

A plausible implication is that further hybridization of physical and neural MPLI systems—integrating optical encoding and learned decoding—will unlock real-time, multi-modal, and dynamically controllable imaging with broad impact across sensing, AR/VR, and scientific imaging. The modularity and adaptability of the multi-plane paradigm position it as a central tool in high-fidelity computational photography, efficient scene relighting, and high-dimensional optical systems.