Real-time NeRF Rendering
- Real-time NeRF Rendering is a paradigm that leverages spatial and hybrid data structures to convert classic, computation-heavy NeRFs into interactive visualizations.
- Key techniques include precomputed octrees, voxel grids, and adaptive sampling that drastically reduce per-ray MLP evaluations while preserving view-dependent fidelity.
- The approach enables deployment on GPUs, mobile devices, and AR/VR headsets by balancing high frame rates with minimal quality loss through hardware-aware optimizations.
Real-time Neural Radiance Field (NeRF) Rendering refers to the suite of methods and data structures that enable the interactive visualization of scenes represented implicitly as neural radiance fields, reaching frame rates orders of magnitude higher than the original NeRF implementations. Classical NeRF, though highly effective for novel view synthesis, incurs prohibitive inference costs, requiring hundreds of multilayer perceptron (MLP) evaluations per camera ray, resulting in frame rates of ≪1 FPS. Real-time NeRF rendering achieves the necessary acceleration via architectural factorization, spatial data structure precomputation, adaptive sample reduction, mesh rasterization pipelines, and hardware-aware optimizations, enabling deployment on commodity GPUs, resource-constrained mobile devices, and AR/VR headsets with minimal quality loss.
1. Core Challenges and Principles
Let NeRF represent a scene as a function , predicting the volume density and view-dependent color at position and direction . Rendering via volume integration requires evaluating 128–512 times per ray and compositing the results—a process dominated by memory-bound MLP inference.
The core challenge is to retain NeRF’s expressivity and view-dependent fidelity while reducing the online computational bottleneck at rendering time. Approaches achieve this by:
- Baking neural outputs into explicit spatial data structures (octrees, sparse grids, triangle meshes)
- Factorizing radiance to remove the dependence on per-ray MLPs (e.g., spherical harmonics, light field decomposition)
- Aggressively pruning or learning to allocate samples only to critical locations
- Mapping neural fields to rasterization pipelines compatible with standard graphics hardware
- Designing device-aware systems and dataflows for efficient on-device or distributed execution
2. Spatial Acceleration Structures and Representation Compression
Octree and Voxel-Grid Models
The PlenOctree approach (Yu et al., 2021) precomputes a sparse octree with leaves storing density and spherical harmonic coefficients for view-dependent radiance, enabling constant-time hierarchical traversal and O(1) radiance recovery per sample. Pre-tabulation and pruning yield runtimes >150 FPS and >3000× speedup over conventional NeRF (800×800 images), with minimal PSNR/SSIM drop (≤1 dB).
Fourier PlenOctree (Wang et al., 2022) extends octree leaves to hold discrete Fourier transform (DFT) coefficients over temporal frames for dynamic scenes, supporting fast time-varying rendering. Across 60-frame sequences, it achieves 100 FPS at 800×800 resolution, with PSNR up to 35.2 dB.
Sparse Neural Grids and Baked Volumes
SNeRG (Hedman et al., 2021) bakes NeRF densities, diffuse colors, and low-dimensional view features into a highly-pruned block-sparse 3D grid. Online, the per-ray cost is reduced to a handful of trilinear texture fetches plus one tiny MLP evaluation per pixel, enabling ~84 FPS and 86MB typical scene storage.
Mesh-based Compression
Re-ReND (Rojas et al., 2023), Omni-Recon (Fu et al., 2024), and MixRT (Li et al., 2023) convert NeRFs into coarse or moderately detailed triangle meshes (with texture atlases and view-dependent features) that drive conventional rasterization pipelines. These can be rendered with standard hardware at 30–70 FPS and storage footprints below 100 MB for indoor scenes.
3. Factorizations and Learning-Based Sample Reduction
Radiance Field Factorizations
FastNeRF (Garbin et al., 2021) decouples scene representation into a position-dependent network yielding density and a "deep radiance map," and a direction-dependent network producing blending weights. The result is a reduction in per-sample cost to one dot-product between cached features, enabling >200 FPS at high resolutions, with no significant metric degradation.
Adaptive Sample and Importance Networks
AdaNeRF (Kurz et al., 2022) introduces a dual-network architecture: a sampling network predicts per-ray importance masks, selecting the critical subset of samples, and a shading network produces densities and colors at those selected points. During rendering, only 2–8 samples per ray are needed (vs 256), yielding 24.8–78.9 ms/frame at high PSNR (~31 dB on synthetic scenes), representing a 70–100× speedup.
Radiance Distribution Fields
NeRDF (Wu et al., 2023) models density and color along the ray as finite Fourier series, with network-predicted per-ray weights. The core integral is computed via a single network pass and volume integration over the analytical basis, enabling 254× acceleration vs classic NeRF with just ≈1 dB PSNR loss.
4. Mesh and Light Field Rasterization Pipelines
Duplex and Multi-layer Meshes
NDRF (Wan et al., 2023) distills a NeRF into two nested triangle meshes (outer and inner isosurfaces), encoding per-vertex neural radiance features. These are rasterized to image-space buffers and shaded using a screen-space convolutional network that aggregates both local geometric context and view direction. The pipeline achieves 60–283 FPS on modern GPUs, retaining NeRF-level fidelity (PSNR up to 32.1 dB).
View-dependent, Hybrid Compositions
MixRT (Li et al., 2023) demonstrates that high fidelity can be attained using coarse meshes, with view-dependent displacements (encoded as SH maps) followed by a compressed NeRF (hash-encoder+MLP). Rendering is performed in hardware rasterization plus a fragment-shader, delivering 31 FPS at 1280×720 and storage <100 MB.
General-purpose Scene Composition
The NeDF-based pipeline (Gao et al., 2023) enables direct, O(1), neural intersection queries between rays and scenes, compositing multiple NeRF objects under arbitrary transforms with dynamic shadowing and analytical lighting, scaling in real-time with number of objects and scene complexity.
5. Mobile and Edge Inference: System-level Optimization
Automated Resource Allocation
NeRFlex (Wang et al., 4 Apr 2025) segments scenes into detail-rich objects and background clusters using segmentation and frequency analysis. Each sub-scene is assigned mesh granularity and texture sizes via a profiler mapping configuration tuples to memory and SSIM. The system employs dynamic programming to solve a multiple-choice knapsack, producing optimal trade-offs under strict device RAM budgets, yielding real-time (35 FPS on iPhone 13) rendering of complex scenes with SSIM ≥0.886.
Quantization and On-device Baked Execution
The system study (Wang et al., 2024) quantifies the impact of mesh granularity, patch size, and MLP quantization on perceptual quality vs FPS for mobile NeRFs. The dominant knob is mesh granularity; 0 yields a steep PSNR increase with negligible FPS loss. MLP quantization (FP16) yields small (<0.2 dB) degradation but little speed gain due to mobile hardware limitations. Optimal configurations enable 60 FPS at high visual quality (<20 MB download).
RT-NeRF (Li et al., 2022) combines algorithmic structure (looping over pre-existing occupied cells in a sparse grid) and hardware features (hybrid bitmap/coordinate encoding, fixed-latency sparse search) to simultaneous achieve >30 FPS and minimize DRAM bandwidth. Coarse-grained view-dependent ordering further enables early ray termination and pipeline utilization above 90%.
6. Scalability, Composability, and Path Tracing
Multi-NeRF and Adaptive Partitioning
Adaptive Multi-NeRF (Wang et al., 2023) partitions large scenes spatially with KD-trees based on guidance density. Each cell is assigned a tiny NeRF MLP, and samples are sorted per node and batch-evaluated, maximizing GPU occupancy and reducing per-frame cost. This yields 30–40% faster render times vs. comparable non-adaptive methods, while matching or exceeding PSNR.
Implicit Surface Representations
KiloNeuS (Esposito et al., 2022) stores surfaces as signed distance functions (SDF) partitioned into grids of tiny MLPs, enabling fast sphere-tracing for explicit surface intersection. Distillation and eikonal-regularized fine-tuning ensure geometric and photometric fidelity, supporting both real-time view synthesis and interactive path tracing with global illumination (≈46 FPS at 1280×720).
Hybrid Volume–Mesh Simulation
Dynamic Mesh-Aware Radiance Fields (Qiao et al., 2023) integrate NeRF and polygonal path tracing. Rays alternate between NeRF volume segments and mesh bounces, updating throughput, radiance, and supporting mesh-NeRF physical interaction (collisions, cloth, rigid body). Rendering and simulation run at ~20–40 FPS for moderate scene sizes.
7. Limitations, Open Problems, and Future Work
- Memory–quality trade-off: High-quality representations (octrees, radiance grids) require large GPU memory (occasionally >1 GB), limiting scalability on edge devices. Quantization and lossy compression help, but remain an area for further development (Yu et al., 2021, Wang et al., 2024).
- Dynamic/Unbounded Scenes: Most real-time solutions target static or bounded domains; dynamic FPOs and hybrid models offer partial solutions but full unbounded, time-varying NeRF at real-time remains challenging (Wang et al., 2022).
- Proxy geometry/mesh extraction: Rasterization pipelines are fundamentally limited by the accuracy of extracted proxy meshes; errors in NeRF’s underlying geometry (e.g., floaters, incomplete surfaces) may degrade appearance (Wan et al., 2023).
- Global illumination: Volumetric methods can be slow for path tracing; tiny-MLP SDF grids and mesh–NeRF hybrids show potential but require further optimization (Esposito et al., 2022, Qiao et al., 2023).
- Adaptive and learned sampling: Architectures that learn not only where to sample, but how representation complexity should be distributed spatially and temporally, may further improve speed–quality trade-offs (Kurz et al., 2022, Wang et al., 2023, Wang et al., 4 Apr 2025).
A continued integration of data structure design, representation decomposition, neural architecture specialization, and hardware-aware scheduling is central to progress in real-time neural rendering for novel view synthesis and interactive applications.