Lighting Encoders: Methods & Applications

Updated 22 November 2025

Lighting encoders are algorithms that convert complex lighting data into compact representations, enabling control and estimation in diverse applications.
They employ methods ranging from classical LFSR-based and run-length limited codes to deep neural autoencoders and transformer architectures.
Practical designs balance metrics like DC-balance, PSNR, and task success rates, addressing challenges in VLC, relighting, robotics, and compression.

A lighting encoder is a technical module or algorithm designed to parameterize, extract, or impose lighting representations—spanning from bitstreams for visible light communication to continuous high-dimensional descriptions of real-world illumination—suitable for control, estimation, harmonization, or efficient transmission in both hardware and neural pipelines. Lighting encoders feature prominently across research areas including signal/communications engineering, computer vision, real-time graphics, computational photography, AR/VR, and robotics. The precise structure and role of a lighting encoder varies by application domain: in VLC it refers to data randomization and channel code constructs for flicker/dimming mitigation; in neural graphics and relighting, it encompasses architectures that encode illumination from images or scenes, producing compact embeddings or volumetric fields that drive downstream tasks.

1. Lighting Encoders in Visible Light Communication: DC-Balanced and Flicker-Mitigation Codes

Lighting encoders are foundational in visible light communication (VLC) systems, especially for flicker-free beaconing and spatially/temporally uniform illumination. Classical run-length limited (RLL) codes limit bit run-lengths but reduce code rate or preclude soft-FEC. Modern non-RLL strategies leverage randomized pre-processing and channel coding to guarantee DC balance and short-run behavior over short frames without incurring heavy rate loss.

In the non-RLL beacon architecture (Nguyen et al., 2018, Nguyen et al., 2019), the lighting encoder comprises:

Pre-scrambler implemented as an LFSR with polynomial $P(x) = x^{15} + x^{14} + 1$ or $P(x) = x^4 + x^3 + 1$ , which whitens input bits, significantly narrowing the output bit-1 fraction to $\approx$ 41.25–63.75%, comparable to much longer blocklengths.
Polar encoder (systematic or non-systematic) with parameters $N=256$ , $K=158$ , achieving code rates $R\approx0.617$ and high hardware efficiency (see Table 3/4, (Nguyen et al., 2018)).
The combined construction minimizes maximum run-length per codeword (empirically $<$ 30 for $N=256$ ), permitting baseband OOK modulation at $f_\text{mod}\gg$ 2000 Hz, which is far above the human-visible flicker threshold.

Performance is evaluated along DC-balance, run-length, bit error rate (BER), and hardware complexity. These designs show superior error correction and instantaneous duty-cycle regularity versus classical RLL plus Reed–Solomon schemes, while requiring only a single short codeword and enabling soft-decision decoding via a 3-bit multi-level quantizer (Nguyen et al., 2018).

In MIMO VLC, lighting encoders generate binary code matrices ensuring strict spatial/temporal DC balance and programmable dimming (Uday et al., 2019). The encoder maps each $k$ -bit message to a $N_\text{T} \times N_\text{T}$ binary permutation/dimming matrix, enforcing constant sums per row/column (user/slot uniformity). Maximum run-length per LED is bounded as $L_r=2N_\text{T}(1-\gamma)$ , making it tunable via dimming factor $\gamma$ . The system achieves information rates $R=\lfloor\log_2(N_\text{T}!)\rfloor/N_\text{T}$ and strong minimum Hamming distance, with simulation showing rates saturating capacity for moderate SNR, robust uniformity, and flicker suppression.

2. Neural Lighting Encoders: Compact Latent Spaces and Graph Representations

In computational photography and scene relighting, neural lighting encoders are employed to decompose scene images or panoramas into compact, expressive latent representations of illumination. These can be designed as:

Autoencoder Latent Codes: Deep CNN autoencoders, such as those in (Weber et al., 2018), map HDR panoramic environment maps $\mathbf{e} \in \mathbb{R}^{64 \times 128 \times 3}$ to low-dimensional vectors $z \in \mathbb{R}^Z$ , with $Z$ in $\{32, 64, 128, 256, 512\}$ . Reconstruction uses log-L1 loss with solid-angle weighting to reflect radiometric fidelity across the hemisphere. This compressed representation enables plausible relighting even from diffuse object views using a separate predictor CNN regressing $z$ from RGB+normal pairs. Compared to SH encoding, the learned space achieves lower scale-invariant errors for equal or lower dimension (Weber et al., 2018).
Graph-based Encodings: DSGLight (Bai et al., 2022) encodes indoor lighting as parameters over 128 fixed spherical Gaussian (SG) lobes, each amplitude-augmented for depth. Using a graph convolutional network (GCN), per-node features are regressed from single-view CNN features, with fixed directions ensuring parameter stability and compactness (512D total). The DSGLight encoder enables spatially-varying, depth-aware illumination for AR relighting, supporting 22–26 dB PSNR on benchmark datasets (Bai et al., 2022).
Spatiotemporal Volumetric Encoders: SGLV (Li et al., 2023) lifts an input image and depth into a sparse 3D hint volume, encoding color, opacity, and emptiness across a grid. A 3D encoder-decoder with GRU fusion produces per-voxel spherical Gaussian parameters. Differentiable ray-tracing across this volume yields spatially and temporally consistent HDR environment maps for AR applications.

3. Lighting Encoders for Image/Video Relighting and Generation

Lighting encoders enable explicit or implicit control over synthetic image and video generation processes:

Disentangled Encoders: In self-supervised relighting pipelines, a lighting encoder shapes a dedicated low-dimensional vector (e.g., 9D SH coefficients) from raw images, disentangled from content tensors (Liu et al., 2020). The encoder is regularized via SH transformation laws under geometric augmentation (flip, rotate, invert), enforced by a custom loss so that the illumination vector manifests true transformation properties of lighting. At test-time, any 9D SH vector can be imposed for arbitrary relighting, demonstrating near-supervised fidelity on portrait datasets.
Video and Diffusion Model Control: Modern T2V and generative pipelines employ transformer-based lighting encoders, which extract framewise latent maps or global direction embeddings and inject them via summation or cross-attention at all transformer layers (Zhang et al., 30 Oct 2024, Zheng et al., 11 Feb 2025). For instance, LumiSculpt (Zhang et al., 30 Oct 2024) encodes spatiotemporal lighting references using a VAE and deep transformer, combining spatial and temporal tokenization and positional encoding. Lighting control is merged into DiT denoising layers by channel-wise addition with a fixed guidance scale, enabling accurate, temporally coherent lighting manipulation.

VidCRAFT3 (Zheng et al., 11 Feb 2025) forms global lighting representations by projecting degree-4 SH embeddings of per-frame lighting direction through an MLP to match transformer hidden size. Parallel cross-attention modules at each UNet block enable simultaneous, fully decoupled conditioning on image, text, and lighting directions for fine-grained, multi-modal video generation.

Portrait Harmonization and Feature-based Guidance: In background replacement, lighting encoders extract feature tensors from backgrounds using deep CNNs, aligned or mapped to the feature space of reference HDR panoramas (Ren et al., 2023). The diffusion backbone leverages these features as additive controls at each U-Net resolution, harmonizing foreground lighting with scene context.

4. Lighting Encoder Design in Robotic and Adverse Imaging Systems

In adverse environments (e.g., underwater manipulation), lighting encoders are engineered for illumination-robust perception and policy control:

Fusion-based Embeddings: Bi-AQUA (Tsunoori et al., 20 Nov 2025) encodes ambient lighting from raw RGB images with a dual-path architecture: a CNN for learned features and a color histogram MLP for global statistics, then concatenated and projected to a low-dimensional vector $v_L$ . Multi-view aggregation synthesizes a global lighting code.
Downstream Conditioning: The lighting code modulates visual backbone features via channel-wise FiLM and is injected to sequence transformers as a dedicated lighting token. All modules are supervised only through the imitation learning objective, without explicit lighting labels. Ablation studies show that FiLM-based modulation alone addresses static lighting, but only the full pipeline (encoder+FiLM+token) achieves robust adaptation to dynamic changes, yielding 80–100% success rates across static and rapidly-varying scenarios (Tsunoori et al., 20 Nov 2025).

5. Lighting Encoders in Light Field Compression and Rendering

Encoders also serve as the primary means for high-efficiency compression and neural replacement in light-field rendering and display:

4D Volume and Transform-based Compression: In HDR light field streaming, a convolutional autoencoder followed by blockwise 4D discrete cosine transforms, perceptually-uniform color mapping, quantization, and HEVC coding achieves an efficient reduction of high-dimensional lighting data (Khaidem et al., 2022). The module learns to collapse myriad sub-aperture views into a highly-compressed representation (typically N=2 coded images), outperforming state-of-the-art codecs on PSNR and bitrate.
Neural Radiance Field Encoders for Complex Luminaires: For rendering arbitrary luminaire emission, a lighting encoder is instantiated as a NeRF-style MLP mapping 3D position and direction (via Fourier encoding) to SH coefficients and densities (Condor et al., 2022). This learned radiance function is distilled to a Plenoctree for real-time use, preserving HDR detail and high-frequency emission with compression by orders of magnitude in storage and evaluation cost.

6. Quantitative Performance and Comparative Analysis

Lighting encoder evaluation metrics are domain-dependent and span:

Communications: BER/FER, code rate, run-length bounds, DC-balance fraction (Nguyen et al., 2018, Uday et al., 2019).
Neural relighting: scale-invariant RMSE, PSNR, SSIM, LPIPS, perceptual loss, and qualitative relighting/shadow transfer fidelity (Weber et al., 2018, Liu et al., 2020, Bai et al., 2022, Ren et al., 2023).
Compression: YUV PSNR, Bjøntegaard-Delta metrics, bit-rate savings over state-of-the-art codecs (Khaidem et al., 2022, Condor et al., 2022).
Robotics: Task success-rate under varying illumination, execution-time to completion across dynamic lighting sequences (Tsunoori et al., 20 Nov 2025).

Empirically, encoder designs favor low-bitrate or low-dimensionality representations that preserve essential lighting effects—such as temporal/spatial DC-balance, high-frequency light sources, or relighting fidelity—while supporting hardware, latency, or generative constraints of the end system.

7. Architectures, Losses, and Implementation Paradigms

Lighting encoders across domains employ diverse architectures:

Classic LFSR and block code book designs (VLC).
Deep convolutional autoencoders with bottleneck splits for content and illumination (Liu et al., 2020).
Transformer blocks with tokenization and cross-modal attention (Zhang et al., 30 Oct 2024, Zheng et al., 11 Feb 2025).
Graph convolutional layers for spherical context propagation (Bai et al., 2022).
Volume-based 3D convolutional pipelines for geometric consistency (Li et al., 2023).
FiLM modulation and explicit sequence conditioning in robotic perception (Tsunoori et al., 20 Nov 2025).

Losses are correspondingly adapted: solid-angle-weighted log-L1 in HDR reconstruction (Weber et al., 2018), SH regularization (Liu et al., 2020), LPIPS and other perceptual losses for image harmonization (Ren et al., 2023), and primary task losses (e.g., behavior cloning) in robotics (Tsunoori et al., 20 Nov 2025).

The trend is toward plug-and-play encoders, easily integrated into hardware or large-scale, multi-modal neural networks, facilitating precise and stable control over lighting—and thus appearance or communication quality—across applications.