Papers
Topics
Authors
Recent
Search
2000 character limit reached

BakedSDF: Real-Time 3D Scene Reconstruction

Updated 20 February 2026
  • BakedSDF is a real-time 3D scene reconstruction system that transforms multi-view imagery into high-quality, mesh-based representations for photorealistic view synthesis.
  • It employs a three-stage pipeline—hybrid neural modeling, mesh extraction (‘baking’), and per-vertex spherical Gaussian appearance encoding—to optimize real-time rendering.
  • Performance metrics show BakedSDF delivers higher speed, accuracy, and energy efficiency than previous methods, enabling applications like material editing and physics simulation.

BakedSDF is a real-time 3D scene reconstruction and view synthesis system that transforms multi-view imagery of large, unbounded real-world environments into high-quality triangle meshes equipped with efficient, compact, view-dependent appearance models. The resulting representations support photorealistic novel view synthesis and are optimized for commodity GPU hardware, leveraging accelerated polygon rasterization pipelines instead of per-frame neural network inference. BakedSDF employs a three-stage pipeline: hybrid neural scene modeling, mesh extraction and refinement ("baking"), and per-vertex appearance encoding with spherical Gaussians, followed by a dedicated optimization stage for high-fidelity rendering. This approach demonstrates state-of-the-art accuracy, speed, and efficiency for real-time rendering and facilitates downstream applications such as material editing and physics simulation (Yariv et al., 2023).

1. Hybrid Neural Volume–Surface Representation

BakedSDF builds upon VolSDF's hybrid volumetric and surface formulation but operates in "contracted" coordinate space xcontract(x)x \mapsto \operatorname{contract}(x) (following mip-NeRF 360, Equation 3), which maps unbounded scenes into bounded domains and regularizes distant geometry. The core neural representation includes:

  • A small "proposal" MLP (4×256, swish activations) predicting coarse densities for importance sampling.
  • A larger NeRF-style SDF MLP (8×1024, swish activations) predicting a scalar signed distance f(x)f(x) and a 256-dimensional appearance bottleneck.

The signed distance field (SDF), f:R3Rf : \mathbb{R}^3 \to \mathbb{R}, implicitly defines the surface S={x:f(x)=0}S = \{x : f(x) = 0\}. The volumetric density is parameterized by

σ(x)=αΨβ(f(x)),\sigma(x) = \alpha \Psi_\beta(f(x)),

where Ψβ\Psi_\beta is the zero-mean Laplace CDF of scale β\beta and α=β1\alpha = \beta^{-1}. As β0\beta \to 0, the density approaches a "thin" surface.

Regularization is enforced by the Eikonal loss

LSDF=Ex[(f(x)21)2],L_\mathrm{SDF} = \mathbb{E}_x \left[ (\|\nabla f(x)\|_2 - 1)^2 \right],

encouraging ff to behave as a proper SDF.

The β\beta parameter is annealed from β0=0.1\beta_0=0.1 to β1{0.015,0.003,0.001}\beta_1 \in\{0.015, 0.003, 0.001\} according to

βt=β0(1+β0β1β1t0.8)1,\beta_t = \beta_0 \left(1 + \frac{\beta_0-\beta_1}{\beta_1} t^{0.8}\right)^{-1},

yielding initially soft, then sharp, surfaces as training progresses.

View-dependent color is modeled using a diffuse branch cd(x)c_d(x) and a specular branch cs(x,dreflected)c_s(x, d_\mathrm{reflected}), where dreflectedd_\mathrm{reflected} is the reflected viewing direction about the normal n(x)=f(x)/f(x)n(x) = \nabla f(x)/\|\nabla f(x)\|. The per-ray color CC is computed via standard NeRF quadrature:

C=iTi(1exp(σiΔi))ci,Ti=exp(j<iσjΔj).C = \sum_i T_i (1 - \exp(-\sigma_i \Delta_i)) c_i, \quad T_i = \exp\left(-\sum_{j<i} \sigma_j \Delta_j\right).

The objective function combines the photometric loss,

Ldata=Er[C(r)Cgt(r)22],L_\mathrm{data} = \mathbb{E}_r \left[\|C(r) - C_\mathrm{gt}(r)\|_2^2\right],

with 0.1LSDF0.1 \cdot L_\mathrm{SDF} and the mip-NeRF 360 proposal loss. Optimization is carried out for 250,000 Adam steps.

2. Mesh Extraction and Refinement (“Baking”)

Post convergence, BakedSDF evaluates f(x)f(x) on a 204832048^3 regular grid in contracted space. Grid points are retained if their volumetric rendering weight exceeds 0.005 or if the proposal MLP marks them as "non-empty," thereby reducing "floaters" in unobserved regions.

The zero-level set is extracted via Marching Cubes at iso-value τ=0.001\tau = 0.001, compensating for the Laplace blur. To repair small mesh holes, 32 iterations of region-growing are performed: for each current mesh vertex, its 838^3 grid neighborhood is reactivated and Marching Cubes is rerun locally.

The mesh is then mapped back to Euclidean world space. A vertex-reordering pass (as in Sander et al. 2007) is applied to optimize for GPU vertex-cache locality.

3. Compact Per-Vertex Spherical Gaussian Appearance Model

Each mesh vertex vv is assigned:

  • A diffuse color cd(v)c_d(v).
  • NN view-dependent spherical Gaussian (SG) lobes, each parameterized by (μ,c,λ)(\mu, c, \lambda): μ\mu is a unit mean direction, cR3c\in \mathbb{R}^3 the RGB weight, and λ>0\lambda > 0 the sharpness.

Given a camera ray of direction dd, its SG contribution is given by

G(d)=i=1Nciexp[λi(μid1)],G(d) = \sum_{i=1}^N c_i \exp[\lambda_i (\mu_i \cdot d - 1)],

with the final fragment color

C=cd+G(d).C = c_d + G(d).

In practice, three SGs are used for "near" regions (x1\|x\| \leq 1) and one SG for "far" regions (x>1\|x\| > 1), balancing view-dependent quality and per-vertex storage.

4. Optimization of the Baked Mesh and SGs

To optimize appearance, the mesh is rasterized into each training view (1920×1080), and, per pixel pp, the triangle identifier and barycentric coordinates are recorded. Per-vertex attributes {cd(v),(μi,ci,λi)}\{c_d(v), (\mu_i, c_i, \lambda_i)\} are refined via minimization of a robust per-pixel color loss [Barron 2019]:

L=pρ(Crendered(p)Ccaptured(p);α=0,c=1/5),L = \sum_p \rho\, \left( C_\mathrm{rendered}(p) - C_\mathrm{captured}(p)\, ;\, \alpha = 0, c = 1/5 \right),

with an additional global clear-color for pixels not mapped to mesh triangles.

Scaling to scenes with hundreds of millions of vertices is achieved by parameterizing per-vertex attributes in an Instant-NGP hash grid (18 levels, 2212^{21} entries, Nmax=8192N_\mathrm{max}=8192). Training employs Adam for 150,000 iterations with weight decay 0.1 and a learning rate schedule 10310410510^{-3} \to 10^{-4} \to 10^{-5}. The optimized attributes are quantized to 8 bits per component and exported as gzipped glTF 2.0 (~434 MB).

5. Performance Metrics and Comparative Analysis

Evaluation on mip-NeRF 360 test scenes demonstrates the following photometric metrics (higher is better for PSNR/SSIM, lower is better for LPIPS), showing BakedSDF outperforms prior real-time view synthesis approaches in both indoor and outdoor scenes:

Method Real-Time PSNR/SSIM/LPIPS (Outdoor) Real-Time (Indoor)
Deep Blending 21.54 / 0.524 / 0.364
MobileNeRF 21.95 / 0.470 / 0.470
Ours (BakedSDF) 22.47 / 0.585 / 0.349 27.06 / 0.836 / 0.258

Rendering speed, power, and storage efficiency are summarized as:

Method Power (W) 1080p FPS FPS/W Disk Size (MB)
Instant-NGP 350 3.78 0.011 107
MobileNeRF 85 50.06 0.589 342
Ours 85 72.21 0.850 434

At an 85 W power budget, BakedSDF yields 1.44× higher FPS/Watt than MobileNeRF, and 77× higher than Instant-NGP, with higher photometric image quality on all metrics.

6. Real-Time Rasterization and Practical Implementation

BakedSDF outputs a mesh+SG glTF asset compatible with standard WebGL- or Vulkan-based renderers. At runtime, the view-dependent color is computed in a simple fragment shader using the SG evaluation, without custom raymarching or neural inference. The scene is bounded by a convex hull of 32 inflated planes and a far sphere (radius 500 units) to mitigate numerical instabilities.

Vertex-order optimization is employed to maximize utilization of on-GPU vertex caches. All SG parameters are stored using 8-bit quantization per component, with quantization supervised during training via straight-through estimators.

The system is deployed fully in-browser (JAX or pure WebGL shaders) on laptops or mobile GPUs. The output watertight mesh facilitates appearance editing (e.g., diffuse vs. specular separation, material recoloring, shadow simulation), physics and collision (rigid-body simulation, synthetic object insertion), and texture- or lightmap baking.

BakedSDF establishes a method for bridging the fidelity of neural radiance fields with the real-time performance and interoperability of polygonal graphics, by "baking" a learned neural SDF into a rasterizable mesh augmented with compact SG-based appearance representations (Yariv et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BakedSDF.