Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sparse Neural Radiance Grids (SNeRG)

Updated 11 May 2026
  • Sparse Neural Radiance Grids (SNeRG) are a hybrid 3D scene representation that converts dense neural evaluations into efficient, block-sparse voxel grids.
  • The architecture uses a deferred shading approach with two MLP stages to separate diffuse accumulation from view-dependent color refinement, ensuring high rendering quality.
  • A comprehensive baking pipeline involving quantization, anti-aliasing, and visibility culling reduces memory usage and enables real-time rendering on standard GPUs.

Sparse Neural Radiance Grids (SNeRG) are a hybrid volumetric scene representation and rendering architecture that enables real-time, photorealistic view synthesis of complex 3D scenes using compact, block-sparse voxel grids augmented with learned feature vectors. SNeRG transforms a trained Neural Radiance Field (NeRF)—which typically requires hundreds of multilayer perceptron (MLP) evaluations per-ray for image synthesis—into a sparse, quantized 3D grid structure suitable for rapid memory access and efficient GPU-based rendering. This approach achieves high-quality view-dependent effects and fine geometric detail, while significantly reducing memory footprint and computation for commodity hardware deployment (Hedman et al., 2021). The SNeRG design has also inspired related work in fast, 3D-aware generative modeling using sparse voxel grids (Schwarz et al., 2022).

1. Architectural Reformulation and Core Principles

SNeRG introduces a deferred NeRF architecture, structurally departing from the canonical NeRF MLP. Standard NeRF takes as input a 3D point x=r(t)\mathbf{x} = \mathbf{r}(t) and a viewing direction d\mathbf{d}, outputting density σ(x)\sigma(\mathbf{x}) and color c(x,d)c(\mathbf{x}, \mathbf{d}). The rendered color along a ray is computed via the standard volume rendering integral: C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right) which is then typically approximated with discrete quadrature (Hedman et al., 2021).

The deferred SNeRG architecture splits the NeRF MLP into two stages:

  • MLP₁: R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})
    • Outputs per-voxel density σ\sigma, diffuse color cd[0,1]3\mathbf{c}_d \in [0,1]^3, and a low-dimensional feature vector v[0,1]4\mathbf{v} \in [0,1]^4.
  • MLP₂: (V,d)Δcspec(\mathbf{V}, \mathbf{d}) \rightarrow \Delta \mathbf{c}_{spec}
    • Small, per-pixel MLP (2 layers, 16 channels) that computes the view-dependent color residual from the opacity-weighted, ray-integrated feature d\mathbf{d}0 and viewing direction.

Rendering proceeds by accumulating the (opacity-weighted) diffuse color and features along each ray and performing a deferred shading pass for view-dependent appearance: d\mathbf{d}1 Opacity regularization is imposed to promote sparsity around surfaces: d\mathbf{d}2 where d\mathbf{d}3 and d\mathbf{d}4, applied to coarse samples only.

2. Sparse Voxel Grid Data Structure

SNeRG "bakes" a dense sampling of the trained NeRF MLP₁ onto a sparse, block-based 3D voxel grid aligned with the scene’s axis-aligned bounding box. For grid resolution d\mathbf{d}5, the structure is as follows:

  • Macroblocks: Voxels are grouped into macroblocks of size d\mathbf{d}6 (d\mathbf{d}7).
  • Indirection Grid: A grid of size d\mathbf{d}8 storing either a null entry (“empty”) or an index into a compact 3D texture atlas.
  • Texture Atlas: Dense 3D arrays for all occupied macroblocks, each storing tuples d\mathbf{d}9 per-voxel, quantized to 8 bits per channel.
  • Voxel Quantization: Convert density to alpha via σ(x)\sigma(\mathbf{x})0, with σ(x)\sigma(\mathbf{x})1 the voxel width; σ(x)\sigma(\mathbf{x})2 and σ(x)\sigma(\mathbf{x})3 are constrained to σ(x)\sigma(\mathbf{x})4 for efficient quantization.

Block sparsity is addressed via visibility culling, discarding macroblocks with max σ(x)\sigma(\mathbf{x})5 or max view transmittance σ(x)\sigma(\mathbf{x})6 across all training cameras (Hedman et al., 2021).

3. Baking Pipeline and Data Compression

The SNeRG pipeline replaces per-frame NeRF evaluation with the following off-line scene baking procedure:

  1. Dense Evaluation: Sample MLP₁ densely at each voxel center in the σ(x)\sigma(\mathbf{x})7 grid; output raw σ(x)\sigma(\mathbf{x})8.
  2. Density-to-Alpha Conversion: Apply σ(x)\sigma(\mathbf{x})9.
  3. Visibility Culling: Discard macroblocks with near-zero occupancy or visibility.
  4. Anti-Aliasing: For each surviving voxel, average outputs of MLP₁ at 16 Gaussian-distributed spatial offsets per voxel to compute anti-aliased c(x,d)c(\mathbf{x}, \mathbf{d})0.
  5. Quantization: Store c(x,d)c(\mathbf{x}, \mathbf{d})1 as 8-bit integers.
  6. Compression:
    • Indirection grid: Losslessly encoded as PNG slices.
    • Atlas: Can be compressed via PNG, JPEG, or lossy video codecs (e.g., H.264) at macroblock granularity.
  7. Optional Fine-Tuning: Freeze baked values, retrain MLP₂ for ~100 epochs (Adam, lr=c(x,d)c(\mathbf{x}, \mathbf{d})2) to recover quantization-induced fidelity loss.

Per-scene baking has c(x,d)c(\mathbf{x}, \mathbf{d})3 cost due to the dense grid evaluation, but this is amortized over inference and typically takes c(x,d)c(\mathbf{x}, \mathbf{d})41 minute on a GPU (Hedman et al., 2021).

4. Real-Time Rendering Algorithm

View synthesis with SNeRG replaces the costly per-ray MLP stack with accelerated texture lookups and a single residual MLP₂ evaluation per pixel:

  1. Coarse Ray Marching: Rays are marched through the indirection grid in steps of c(x,d)c(\mathbf{x}, \mathbf{d})5. Ray-box intersection tests efficiently skip empty macroblocks.
  2. Fine Marching in Occupied Blocks: Within non-empty blocks, step size c(x,d)c(\mathbf{x}, \mathbf{d})6; for each sample c(x,d)c(\mathbf{x}, \mathbf{d})7:
    • Lookup block index and fetch c(x,d)c(\mathbf{x}, \mathbf{d})8.
    • If c(x,d)c(\mathbf{x}, \mathbf{d})9, continue. Otherwise, trilinearly interpolate C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)0 from adjacent voxels.
    • Accumulate

    C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)1

  • Terminate on reaching C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)2 opacity.
  1. Deferred Shading: Evaluate MLP₂ once per pixel, C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)3; output

C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)4

GPU implementation uses 3 C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)5 8-bit 3D textures (for C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)6, RGB, features) and GLSL for MLP₂ evaluation (Hedman et al., 2021).

5. Memory Footprint and Computational Performance

The SNeRG pipeline achieves compact, scalable memory usage and real-time rendering rates:

  • Compressed Memory: C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)786 MB per scene (synthetic 360°), C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)850 MB for real scans; stored as compressed PNG or JPEG.

  • Uncompressed GPU Memory: C^(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp(0tσ(r(s))ds)\hat{C}(\mathbf{r}) = \int_0^{\infty} T(t) \sigma(\mathbf{r}(t)) c(\mathbf{r}(t),\mathbf{d})\,dt,\quad T(t) = \exp\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right)91.7 GB per scene (for R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})0).

  • Performance:

    • Synthetic scenes (R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})1): R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})284 fps (AMD Radeon 5500M, 85 W, 0.99 fps/W).
    • Real 360° scenes (R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})3): 40–55 fps.
    • Forward-facing (real): 27–60 fps.
  • Scene Baking: R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})41 minute on GPU with R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})5 MLP evaluations.

Quantization and block-level sparsity enable practical deployment but may cause slight blurring and potential “floating alpha” artifacts, which fine-tuning of MLP₂ can address (Hedman et al., 2021).

Sparse block-based voxel grids for radiance field parameterization have influenced broader 3D-aware generative modeling trends. Notably, the VoxGRAF architecture ("Fast 3D-Aware Image Synthesis with Sparse Voxel Grids") implements a SNeRG-style grid with progressive grid growing, free-space pruning, and 3D convolutional scene generators, omitting per-ray MLP evaluations at inference time (Schwarz et al., 2022). The core representation retains two fields per voxel: density R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})6 and RGB color R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})7, supported by active-voxel bitmasks/compressed coordinate lists and modular 3D CNN inference.

VoxGRAF demonstrates that (1) sparse 3D CNNs can replace MLPs when trained with appropriate regularization, (2) real-time novel-view rendering (up to 167 FPS amortized) and competitive image fidelity (FID = 9.6 on FFHQ R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})8) are achievable; and (3) memory requirements (e.g., R3(σ,cd,v)\mathbb{R}^3 \rightarrow (\sigma, \mathbf{c}_d, \mathbf{v})90.9 GB at σ\sigma0 with 95% sparsity) are substantially lower than dense radiance field models (Schwarz et al., 2022).

7. Advantages, Limitations, and Practical Considerations

Advantages:

  • Real-time view synthesis (σ\sigma1 Hz) on commodity hardware via block-skipping and atomic trilinear lookups.
  • Compact encoding (tens of MB), suitable for storage- and bandwidth-constrained contexts.
  • Preservation of view-dependent and geometric detail with high rendering quality.
  • Implementation with standard GPU 3D texture primitives and a small residual MLP.

Limitations:

  • Quantization and block-based sparsity can induce minor visual artifacts (blurring, floating alpha holes).
  • Fixed voxel grid resolution incurs a trade-off between speed and rendering quality; higher σ\sigma2 yields better fidelity at increased memory and bake time costs.
  • Scene-specific preprocessing (scene bounding box, dense training coverage) is required for effective visibility culling and block sparsity.
  • Baking is an offline step with σ\sigma3 MLP₁ invocations.
  • The approach is not suited to highly dynamic or deformable scenes without full re-baking.

In summary, Sparse Neural Radiance Grids provide an efficient, compact, and hardware-friendly method for 3D scene representation and real-time novel view synthesis, with technical features and limitations rigorously characterized in the foundational study (Hedman et al., 2021) and extended in 3D-aware synthesis methods such as VoxGRAF (Schwarz et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparse Neural Radiance Grids (SNeRG).