BakedSDF: Real-Time 3D Scene Reconstruction

Updated 20 February 2026

BakedSDF is a real-time 3D scene reconstruction system that transforms multi-view imagery into high-quality, mesh-based representations for photorealistic view synthesis.
It employs a three-stage pipeline—hybrid neural modeling, mesh extraction (‘baking’), and per-vertex spherical Gaussian appearance encoding—to optimize real-time rendering.
Performance metrics show BakedSDF delivers higher speed, accuracy, and energy efficiency than previous methods, enabling applications like material editing and physics simulation.

BakedSDF is a real-time 3D scene reconstruction and view synthesis system that transforms multi-view imagery of large, unbounded real-world environments into high-quality triangle meshes equipped with efficient, compact, view-dependent appearance models. The resulting representations support photorealistic novel view synthesis and are optimized for commodity GPU hardware, leveraging accelerated polygon rasterization pipelines instead of per-frame neural network inference. BakedSDF employs a three-stage pipeline: hybrid neural scene modeling, mesh extraction and refinement ("baking"), and per-vertex appearance encoding with spherical Gaussians, followed by a dedicated optimization stage for high-fidelity rendering. This approach demonstrates state-of-the-art accuracy, speed, and efficiency for real-time rendering and facilitates downstream applications such as material editing and physics simulation (Yariv et al., 2023).

1. Hybrid Neural Volume–Surface Representation

BakedSDF builds upon VolSDF's hybrid volumetric and surface formulation but operates in "contracted" coordinate space $x \mapsto \operatorname{contract}(x)$ (following mip-NeRF 360, Equation 3), which maps unbounded scenes into bounded domains and regularizes distant geometry. The core neural representation includes:

A small "proposal" MLP (4×256, swish activations) predicting coarse densities for importance sampling.
A larger NeRF-style SDF MLP (8×1024, swish activations) predicting a scalar signed distance $f(x)$ and a 256-dimensional appearance bottleneck.

The signed distance field (SDF), $f : \mathbb{R}^3 \to \mathbb{R}$ , implicitly defines the surface $S = \{x : f(x) = 0\}$ . The volumetric density is parameterized by

$\sigma(x) = \alpha \Psi_\beta(f(x)),$

where $\Psi_\beta$ is the zero-mean Laplace CDF of scale $\beta$ and $\alpha = \beta^{-1}$ . As $\beta \to 0$ , the density approaches a "thin" surface.

Regularization is enforced by the Eikonal loss

$L_\mathrm{SDF} = \mathbb{E}_x \left[ (\|\nabla f(x)\|_2 - 1)^2 \right],$

encouraging $f$ to behave as a proper SDF.

The $\beta$ parameter is annealed from $\beta_0=0.1$ to $\beta_1 \in\{0.015, 0.003, 0.001\}$ according to

$\beta_t = \beta_0 \left(1 + \frac{\beta_0-\beta_1}{\beta_1} t^{0.8}\right)^{-1},$

yielding initially soft, then sharp, surfaces as training progresses.

View-dependent color is modeled using a diffuse branch $c_d(x)$ and a specular branch $c_s(x, d_\mathrm{reflected})$ , where $d_\mathrm{reflected}$ is the reflected viewing direction about the normal $n(x) = \nabla f(x)/\|\nabla f(x)\|$ . The per-ray color $C$ is computed via standard NeRF quadrature:

$C = \sum_i T_i (1 - \exp(-\sigma_i \Delta_i)) c_i, \quad T_i = \exp\left(-\sum_{j<i} \sigma_j \Delta_j\right).$

The objective function combines the photometric loss,

$L_\mathrm{data} = \mathbb{E}_r \left[\|C(r) - C_\mathrm{gt}(r)\|_2^2\right],$

with $0.1 \cdot L_\mathrm{SDF}$ and the mip-NeRF 360 proposal loss. Optimization is carried out for 250,000 Adam steps.

Post convergence, BakedSDF evaluates $f(x)$ on a $2048^3$ regular grid in contracted space. Grid points are retained if their volumetric rendering weight exceeds 0.005 or if the proposal MLP marks them as "non-empty," thereby reducing "floaters" in unobserved regions.

The zero-level set is extracted via Marching Cubes at iso-value $\tau = 0.001$ , compensating for the Laplace blur. To repair small mesh holes, 32 iterations of region-growing are performed: for each current mesh vertex, its $8^3$ grid neighborhood is reactivated and Marching Cubes is rerun locally.

The mesh is then mapped back to Euclidean world space. A vertex-reordering pass (as in Sander et al. 2007) is applied to optimize for GPU vertex-cache locality.

3. Compact Per-Vertex Spherical Gaussian Appearance Model

Each mesh vertex $v$ is assigned:

A diffuse color $c_d(v)$ .
$N$ view-dependent spherical Gaussian (SG) lobes, each parameterized by $(\mu, c, \lambda)$ : $\mu$ is a unit mean direction, $c\in \mathbb{R}^3$ the RGB weight, and $\lambda > 0$ the sharpness.

Given a camera ray of direction $d$ , its SG contribution is given by

$G(d) = \sum_{i=1}^N c_i \exp[\lambda_i (\mu_i \cdot d - 1)],$

with the final fragment color

$C = c_d + G(d).$

In practice, three SGs are used for "near" regions ( $\|x\| \leq 1$ ) and one SG for "far" regions ( $\|x\| > 1$ ), balancing view-dependent quality and per-vertex storage.

4. Optimization of the Baked Mesh and SGs

To optimize appearance, the mesh is rasterized into each training view (1920×1080), and, per pixel $p$ , the triangle identifier and barycentric coordinates are recorded. Per-vertex attributes $\{c_d(v), (\mu_i, c_i, \lambda_i)\}$ are refined via minimization of a robust per-pixel color loss [Barron 2019]:

$L = \sum_p \rho\, \left( C_\mathrm{rendered}(p) - C_\mathrm{captured}(p)\, ;\, \alpha = 0, c = 1/5 \right),$

with an additional global clear-color for pixels not mapped to mesh triangles.

Scaling to scenes with hundreds of millions of vertices is achieved by parameterizing per-vertex attributes in an Instant-NGP hash grid (18 levels, $2^{21}$ entries, $N_\mathrm{max}=8192$ ). Training employs Adam for 150,000 iterations with weight decay 0.1 and a learning rate schedule $10^{-3} \to 10^{-4} \to 10^{-5}$ . The optimized attributes are quantized to 8 bits per component and exported as gzipped glTF 2.0 (~434 MB).

5. Performance Metrics and Comparative Analysis

Evaluation on mip-NeRF 360 test scenes demonstrates the following photometric metrics (higher is better for PSNR/SSIM, lower is better for LPIPS), showing BakedSDF outperforms prior real-time view synthesis approaches in both indoor and outdoor scenes:

Method	Real-Time PSNR/SSIM/LPIPS (Outdoor)	Real-Time (Indoor)
Deep Blending	21.54 / 0.524 / 0.364	–
MobileNeRF	21.95 / 0.470 / 0.470	–
Ours (BakedSDF)	22.47 / 0.585 / 0.349	27.06 / 0.836 / 0.258

Rendering speed, power, and storage efficiency are summarized as:

Method	Power (W)	1080p FPS	FPS/W	Disk Size (MB)
Instant-NGP	350	3.78	0.011	107
MobileNeRF	85	50.06	0.589	342
Ours	85	72.21	0.850	434

At an 85 W power budget, BakedSDF yields 1.44× higher FPS/Watt than MobileNeRF, and 77× higher than Instant-NGP, with higher photometric image quality on all metrics.

6. Real-Time Rasterization and Practical Implementation

BakedSDF outputs a mesh+SG glTF asset compatible with standard WebGL- or Vulkan-based renderers. At runtime, the view-dependent color is computed in a simple fragment shader using the SG evaluation, without custom raymarching or neural inference. The scene is bounded by a convex hull of 32 inflated planes and a far sphere (radius 500 units) to mitigate numerical instabilities.

Vertex-order optimization is employed to maximize utilization of on-GPU vertex caches. All SG parameters are stored using 8-bit quantization per component, with quantization supervised during training via straight-through estimators.

The system is deployed fully in-browser (JAX or pure WebGL shaders) on laptops or mobile GPUs. The output watertight mesh facilitates appearance editing (e.g., diffuse vs. specular separation, material recoloring, shadow simulation), physics and collision (rigid-body simulation, synthetic object insertion), and texture- or lightmap baking.

BakedSDF establishes a method for bridging the fidelity of neural radiance fields with the real-time performance and interoperability of polygonal graphics, by "baking" a learned neural SDF into a rasterizable mesh augmented with compact SG-based appearance representations (Yariv et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

BakedSDF: Meshing Neural SDFs for Real-Time View Synthesis (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BakedSDF.

BakedSDF: Real-Time 3D Scene Reconstruction

1. Hybrid Neural Volume–Surface Representation

2. Mesh Extraction and Refinement (“Baking”)

3. Compact Per-Vertex Spherical Gaussian Appearance Model

4. Optimization of the Baked Mesh and SGs

5. Performance Metrics and Comparative Analysis

6. Real-Time Rasterization and Practical Implementation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

BakedSDF: Real-Time 3D Scene Reconstruction

1. Hybrid Neural Volume–Surface Representation

2. Mesh Extraction and Refinement (“Baking”)

3. Compact Per-Vertex Spherical Gaussian Appearance Model

4. Optimization of the Baked Mesh and SGs

5. Performance Metrics and Comparative Analysis

6. Real-Time Rasterization and Practical Implementation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics