Instant Neural Graphics Primitives (iNGP)

Updated 11 November 2025

Instant Neural Graphics Primitives (iNGP) are neural field representations that combine multi-resolution hash tables with small MLPs for efficient, real-time 3D reconstruction and view synthesis.
The design shifts most computational burden to trainable, spatially adaptive hash encodings, yielding orders-of-magnitude speedups and reduced memory usage compared to conventional methods.
A fused GPU implementation with CUDA kernels enables rapid convergence and high-fidelity rendering, making iNGP applicable to diverse tasks including satellite imagery 3D reconstruction and simplex-based encodings.

Instant Neural Graphics Primitives (iNGP) refer to a class of neural field representations in which a compact neural network (typically a tiny multilayer perceptron) is combined with a multi-resolution hash-table encoding of spatial coordinates. This design enables the efficient parameterization, rapid training, and real-time inference of implicit graphics primitives—such as signed distance fields (SDF) and neural radiance fields (NeRF)—across tasks including novel view synthesis, 3D reconstruction, and image fitting. The iNGP pipeline achieves orders-of-magnitude speedups over prior neural representations by shifting most of the representational burden to data-adaptive, trainable hash-based lookup tables that precondition the input space hierarchically, thereby allowing extremely compact and shallow networks to suffice for high-fidelity reconstruction.

1. Multi-Resolution Hash Encoding

At the core of iNGP is a spatial encoding that maps an input coordinate $x \in \mathbb{R}^d$ into a concatenated feature vector by querying several hash tables indexed at different spatial resolutions. For each level $\ell = 0, \ldots, L-1$ , the input coordinate is quantized onto a $2^{\ell}$ -spaced grid: $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ . The high-dimensional grid index $u_\ell$ is reduced via modular hashing: $h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ , with $T_\ell$ typically $2^{19}$ for practical memory usage.

Each hash table stores $T_\ell$ trainable feature vectors $\mathbf{t}_\ell[k] \in \mathbb{R}^F$ , where $\ell = 0, \ldots, L-1$ 0 is usually $\ell = 0, \ldots, L-1$ 1 for radiance field tasks and $\ell = 0, \ldots, L-1$ 2 for SDFs. For a given $\ell = 0, \ldots, L-1$ 3, the features at all $\ell = 0, \ldots, L-1$ 4 levels are retrieved and concatenated:

$\ell = 0, \ldots, L-1$ 5

The total parameter count for all hash tables is $\ell = 0, \ldots, L-1$ 6, typically in the low tens of millions even for complex scenes, which is still an order of magnitude smaller than the expanded input dimensionality of high-frequency Fourier or sinusoidal encodings. The trainable hash tables enable the architecture to adaptively allocate capacity to scene regions with high spatial complexity, handling local frequency variation without global basis expansion.

2. Compact Neural Network Architecture

The concatenated hash-encoded feature $\ell = 0, \ldots, L-1$ 7 serves as input to a compact multilayer perceptron (MLP) of depth four (hidden width $\ell = 0, \ldots, L-1$ 8). The design for NeRF-style tasks uses:

Input: $\ell = 0, \ldots, L-1$ 9 features (e.g., $2^{\ell}$ 0).
Four hidden fully-connected layers of width 64, with ReLU activations.
A skip connection concatenates $2^{\ell}$ 1 to the preactivation of the third hidden layer.
Output heads: scalar density $2^{\ell}$ 2 (with softplus activation) and RGB color $2^{\ell}$ 3 (with sigmoid activation).

The forward pass is: $2^{\ell}$ 4 The total MLP parameter count is $2^{\ell}$ 5, a reduction of several orders of magnitude compared to conventional neural field networks.

3. Optimization and Loss Functions

The hash tables $2^{\ell}$ 6 and the neural network weights $2^{\ell}$ 7 are jointly optimized via Adam. Learning rates are tuned independently for the hash ( $2^{\ell}$ 8) and network ( $2^{\ell}$ 9) parameters (e.g., $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 0 and $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 1, respectively), with exponential decay.

For NeRF reconstruction, the per-ray loss is:

$u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 2

Here, $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 3 is the color rendered via volumetric integration, and $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 4 the target pixel color.

For SDF fitting: $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 5 with the Eikonal constraint enforcing signed distance regularity. No additional regularization is used; the hash tables function as implicit regularizers by concentrating capacity where needed.

4. GPU Implementation and Real-Time Training

The full encoding and MLP are implemented as fully fused CUDA kernels. All hash lookups, feature concatenation, and MLP layers are composed in a single kernel launch, minimizing global memory traffic. For a batched inference/training pass, hash lookups from all levels for each point are performed in sequence, using coalesced memory access patterns, and exploiting 16-bit storage for both hash features and network weights to maximize arithmetic throughput.

This fused architecture eliminates intermediate memory reads/writes, reducing wasted bandwidth and increasing computational efficiency. On contemporary hardware, such as an NVIDIA A100, iNGP systems can converge to high-fidelity solutions in $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 6 seconds for standard NeRF scenes, with inference rendering of $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 7 images at over $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 8 Hz (32 samples per ray).

5. Empirical Performance and Comparison

Comparison against classic NeRF architectures (e.g., a $u_\ell = \lfloor 2^\ell x \rfloor \in \mathbb{Z}^d$ 9M-parameter MLP with large positional encodings):

Training time reduced from tens of minutes/hours to seconds.
Memory bandwidth requirements decrease by a factor of $u_\ell$ 020.
Computation measured in FLOPs drops by a factor of $u_\ell$ 110.
Reconstruction quality (PSNR, SSIM) improves: typical gains of $u_\ell$ 2 to $u_\ell$ 3 dB PSNR and several percentage points in SSIM.
Total hash+MLP storage (e.g., $u_\ell$ 4M+ $u_\ell$ 5K params) significantly below prior methods, while achieving equal or better expressivity.

This encoding supports strong locality and data adaptivity, handling real-world detail (e.g., thin structures, fine gradients) with greater parameter and runtime efficiency.

6. Extensions and Variants

The multi-resolution hash encoding and fast sampling MLP framework has been adapted for diverse settings:

Satellite imagery 3D reconstruction: SAT-NGP (Billouard et al., 2024) couples iNGP's hash-encoding and occupancy grid sampling to achieve relightable, transient-free neural reconstructions of satellite-captured urban scenes, converging in $u_\ell$ 6– $u_\ell$ 7 minutes on a single $u_\ell$ 8GB GPU. The model replaces classical, computationally intense NeRF backbones (e.g., 8-layer $u_\ell$ 9-unit MLP) with a $h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 0 MLP, integrates robust loss reweighting, and supports explicit lighting vector injection for dynamic relighting.
Simplex-based encodings (Wen et al., 2023): These generalize the grid hash to n-simplices, reducing neighbor lookup to $h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 1 (from $h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 2 in grids), with skewed coordinate transforms and barycentric interpolation. Empirical results show $h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 39.4% speedup in $h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 4D image fitting, and up to $h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 5 speedup on dense volumetric tasks, with equivalent quality and lower memory waste as dimension increases.

Method	Hash Structure	Neighbor Count (nD)	Typical Speedup	Quality
iNGP (grid)	Grid	$h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 6	Baseline	Reference
Simplex-based	Simplices	$h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 7	$h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 81.1–1.4 $h_\ell(x) = (u_\ell \mod T_\ell) \in \{0, \ldots, T_\ell-1\}$ 9	Same PSNR/SSIM

This suggests that as problem dimensionality rises, simplex-based hash encoding offers substantial scalability advantages over grid-based iNGP, both in memory and computational efficiency.

7. Significance and Theoretical Insights

The essential mechanism underlying iNGP's speed and quality is the separation of concerns between spatial encoding and neural modeling. By leveraging trainable, multi-resolution hash tables, the architecture conditions the MLP on features that are spatially and spectrally adapted to the task, relieving the network from having to untangle high-dimensional frequency content globally. Local scene complexity is handled by the hash features, while the MLP decodes these into outputs (density, color, signed distance) with minimal depth and parameterization.

In contrast to fixed high-frequency embeddings, which require wide and deep networks to achieve similar expressivity, iNGP architectures achieve efficient gradient flow, compact storage, and hardware-level parallelism. This combination creates a practical path for real-time neural rendering, rapid fitting of 3D scenes, and on-the-fly adaptation of graphics primitives across application domains.

A plausible implication is that as implementations continue to optimize memory access patterns (e.g., fusing occupancy-grid updates, quantizing hash tables), and as further hierarchical or non-grid structures (simplex, tree, etc.) are explored, the iNGP paradigm will remain central to the emerging field of neural graphics primitives, especially for applications requiring both high fidelity and interactive rates.

Markdown Report Issue Upgrade to Chat

References (2)

SAT-NGP : Unleashing Neural Graphics Primitives for Fast Relightable Transient-Free 3D reconstruction from Satellite Imagery (2024)

Efficient Encoding of Graphics Primitives with Simplex-based Structures (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Instant Neural Graphics Primitives (iNGP).

Instant Neural Graphics Primitives (iNGP)

1. Multi-Resolution Hash Encoding

2. Compact Neural Network Architecture

3. Optimization and Loss Functions

4. GPU Implementation and Real-Time Training

5. Empirical Performance and Comparison

6. Extensions and Variants

7. Significance and Theoretical Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Instant Neural Graphics Primitives (iNGP)

1. Multi-Resolution Hash Encoding

2. Compact Neural Network Architecture

3. Optimization and Loss Functions

4. GPU Implementation and Real-Time Training

5. Empirical Performance and Comparison

6. Extensions and Variants

7. Significance and Theoretical Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research