Neural Irradiance Volume for Real-Time Rendering
- Neural Irradiance Volume is a neural representation that regresses a continuous 5D irradiance field conditioned on position and direction for real-time diffuse illumination.
- It utilizes a dual-branch architecture with multi-level hash encoding and a compact MLP to achieve high-fidelity rendering within a 1–5 MB memory footprint.
- NIV supports dynamic and temporal effects by integrating positional and directional cues, enabling smooth indirect lighting with improved quality over traditional methods.
Neural Irradiance Volume (NIV) is a neural representation designed for real-time rendering of diffuse global illumination, aiming to replace conventional probe-based volumetric irradiance schemes with a learned, highly compact, and memory-efficient model. NIV achieves this by directly regressing a continuous 5D field of irradiance conditioned on position and direction, using neural compression strategies that circumvent the cubic memory costs and structural artifacts of dense probe grids. The approach is distinguished by its suitability for real-time constraints, enabling high-fidelity rendering of indirect lighting effects for both static and dynamic content with a minimal hardware and runtime footprint (Coomans et al., 13 Feb 2026).
1. Formulation of the Irradiance Field
Diffuse irradiance at a given location and orientation, denoted , is defined as
where is a spatial point, is the target normal, and is the incoming radiance. Conventional probe grids discretize at regular grid locations and encode its angular dependence via spherical harmonics, requiring per-scene memory for a cubic grid of resolution. NIV models as a continuous function parameterized by a neural network:
where is instantiated as a compact multilayer perceptron (MLP) with auxiliary position and direction encodings. This architecture decouples representation quality from grid resolution and enables direct volumetric queries at arbitrary . Embedding strategies include positional (Fourier) encoding for smaller models and learned multi-level hash encoding for scalable compression; the direction is either embedded with a low-order spherical encoding or fed raw as a 3-vector.
2. Model Architecture and Neural Compression
The NIV system comprises two main encoding branches:
- A multi-level hash encoding of , with features defined as
where are trainable lookup tables and is the number of hash levels.
- Direction encoding, either via a low-dimensional spherical basis or directly as a 3D vector.
The concatenated feature vector is processed by a 4-layer fully connected MLP (with ReLU activations except the output), whose hidden width (typically 64–256 units) and hash levels () are adjusted for desired memory and inference speed. No normalization layers are used. Model sizes vary from 0.003 MB (positionally encoded only) up to 5.4 MB (8-level hash, 64-width MLP). Example pseudocode structure is:
1 2 3 4 5 6 7 |
function query_NIV(x, n):
hx = hash_encode(x) # F-dimensional
dx = dir_encode(n) # 3 or higher-dim
z = concat(hx, dx)
for i in 1..3:
z = ReLU(W[i] * z + b[i])
return W[4]*z + b[4] # scalar irradiance |
3. Data Preparation and Training Objectives
Training samples are drawn from path-traced ground-truth indirect irradiance. 20% of the dataset samples are drawn precisely at surface points (with set to the true normal), supporting sharp shadow boundaries and accurate surface details; the remainder are sampled volumetrically. Samples inside geometry, as determined by back-facing normals at first hit, are excluded. The loss function is a stabilized relative mean squared error:
Optimization employs Adam with learning rate decayed from to over approximately 50k iterations. No explicit regularizers are required to manage hash collisions; gradient competition provides implicit spatial allocation.
4. Memory Scaling and Empirical Quality Comparison
Traditional probe grids with nodes and 9 SH coefficients at half-precision require bytes. NIV total memory consumption is calculated as
Empirical results on the Sponza scene show that a 1 MB NIV achieves MSE compared to probe grid MSE at equivalent budget—a quality improvement of at least . MSE-to-memory scaling consistently favors NIV over probe-based approaches across multiple budgets.
| Levels of Hash | Memory (MB) | Full-HD (ms) | MSE (Sponza) |
|---|---|---|---|
| 0 (PE only) | 0.003 | 0.19 | 1.2e–4 |
| 2 levels | 0.16 | 0.31 | 4.5e–5 |
| 4 levels | 1.20 | 0.67 | 2.1e–5 |
| 6 levels | 3.30 | 1.06 | 1.2e–5 |
| 8 levels | 5.40 | 1.35 | 9.1e–6 |
5. Real-Time Inference and Integration Pipeline
At runtime, the scene is rasterized into a G-buffer that records per-pixel (world-space position, normal, and albedo). For each pixel, the system:
- Computes hash and direction encodings for .
- Runs the 4-layer MLP to predict .
- Calculates indirect diffuse radiance .
- Adds direct illumination and emitted light as applicable.
A single full-screen pass suffices. Optional features include half-resolution shading followed by bilinear upsampling (0.37 ms for 8 hash levels) and a dynamic ambient occlusion pass (0.2 ms) focused on dynamic geometry. Complexity scales as hash lookups and FLOPs per pixel, with measured latencies of 0.19 ms to 1.35 ms per frame (full HD, RTX 4090, FP16). Integration into graphics pipelines replaces probe-interpolation shaders with a neural volume evaluation on G-buffer inputs.
6. Support for Dynamic and Temporal Effects
NIV is volumetric and independent of mesh topology or surface parameterization, permitting queries for novel or moving objects without retraining. Temporal effects, such as time-of-day changes, are supported natively by extending the input with a scalar parameter (e.g., sun angle), encoded by a small Fourier basis:
The network is trained on randomized values, and at inference, the current is passed directly to the model. This enables real-time rendering of smoothly varying illumination due to dynamic scene factors without additional runtime cost or model retraining.
7. Performance Analysis, Limitations, and Directions for Extension
NIV’s key performance metrics include frame times of approximately 1 ms at full HD resolution and memory footprints between 0.003 MB and 5.4 MB, substantially lower than probe grids (14 MB) or production DDGI (35 MB). Quality improvements are consistently observed, with NIV exhibiting more accurate indirect lighting, fewer light leaks, and sharper contact shadows. Training ablations demonstrate that culling samples inside geometry reduces error by 15\%, and surface-biased sampling helps preserve shadow detail. Pre-integrating rather than regressing raw further accelerates convergence and improves runtime stability.
Current limitations include:
- Restriction to indirect diffuse illumination; direct illumination still requires conventional shadow mapping, and scaling up to scenes with many shadow-mapped lights may incur added cost or noise.
- Large dynamic occluders are not fully modeled; existing AO approximates only local occlusion.
- Glossy transport is not addressed except via deferral to NIV at diffuse bounces. Extension to support glossy materials would require augmenting the model, for example by encoding surface roughness.
- Capacity saturation in very large scenes, suggesting a need for tiled or level-of-detail variants.
- Opportunities remain for further compression (e.g., hash probing, pruning, quantization) and for online updates via fine-tuning on recent samples, trading accuracy for latency as necessary.
NIV is fully differentiable and compatible with inverse-rendering applications, including light optimization and mixed-reality pipelines, contingent on appropriate integration with downstream assets (Coomans et al., 13 Feb 2026).