BakedSDF: Real-Time 3D Scene Reconstruction
- BakedSDF is a real-time 3D scene reconstruction system that transforms multi-view imagery into high-quality, mesh-based representations for photorealistic view synthesis.
- It employs a three-stage pipeline—hybrid neural modeling, mesh extraction (‘baking’), and per-vertex spherical Gaussian appearance encoding—to optimize real-time rendering.
- Performance metrics show BakedSDF delivers higher speed, accuracy, and energy efficiency than previous methods, enabling applications like material editing and physics simulation.
BakedSDF is a real-time 3D scene reconstruction and view synthesis system that transforms multi-view imagery of large, unbounded real-world environments into high-quality triangle meshes equipped with efficient, compact, view-dependent appearance models. The resulting representations support photorealistic novel view synthesis and are optimized for commodity GPU hardware, leveraging accelerated polygon rasterization pipelines instead of per-frame neural network inference. BakedSDF employs a three-stage pipeline: hybrid neural scene modeling, mesh extraction and refinement ("baking"), and per-vertex appearance encoding with spherical Gaussians, followed by a dedicated optimization stage for high-fidelity rendering. This approach demonstrates state-of-the-art accuracy, speed, and efficiency for real-time rendering and facilitates downstream applications such as material editing and physics simulation (Yariv et al., 2023).
1. Hybrid Neural Volume–Surface Representation
BakedSDF builds upon VolSDF's hybrid volumetric and surface formulation but operates in "contracted" coordinate space (following mip-NeRF 360, Equation 3), which maps unbounded scenes into bounded domains and regularizes distant geometry. The core neural representation includes:
- A small "proposal" MLP (4×256, swish activations) predicting coarse densities for importance sampling.
- A larger NeRF-style SDF MLP (8×1024, swish activations) predicting a scalar signed distance and a 256-dimensional appearance bottleneck.
The signed distance field (SDF), , implicitly defines the surface . The volumetric density is parameterized by
where is the zero-mean Laplace CDF of scale and . As , the density approaches a "thin" surface.
Regularization is enforced by the Eikonal loss
encouraging to behave as a proper SDF.
The parameter is annealed from to according to
yielding initially soft, then sharp, surfaces as training progresses.
View-dependent color is modeled using a diffuse branch and a specular branch , where is the reflected viewing direction about the normal . The per-ray color is computed via standard NeRF quadrature:
The objective function combines the photometric loss,
with and the mip-NeRF 360 proposal loss. Optimization is carried out for 250,000 Adam steps.
2. Mesh Extraction and Refinement (“Baking”)
Post convergence, BakedSDF evaluates on a regular grid in contracted space. Grid points are retained if their volumetric rendering weight exceeds 0.005 or if the proposal MLP marks them as "non-empty," thereby reducing "floaters" in unobserved regions.
The zero-level set is extracted via Marching Cubes at iso-value , compensating for the Laplace blur. To repair small mesh holes, 32 iterations of region-growing are performed: for each current mesh vertex, its grid neighborhood is reactivated and Marching Cubes is rerun locally.
The mesh is then mapped back to Euclidean world space. A vertex-reordering pass (as in Sander et al. 2007) is applied to optimize for GPU vertex-cache locality.
3. Compact Per-Vertex Spherical Gaussian Appearance Model
Each mesh vertex is assigned:
- A diffuse color .
- view-dependent spherical Gaussian (SG) lobes, each parameterized by : is a unit mean direction, the RGB weight, and the sharpness.
Given a camera ray of direction , its SG contribution is given by
with the final fragment color
In practice, three SGs are used for "near" regions () and one SG for "far" regions (), balancing view-dependent quality and per-vertex storage.
4. Optimization of the Baked Mesh and SGs
To optimize appearance, the mesh is rasterized into each training view (1920×1080), and, per pixel , the triangle identifier and barycentric coordinates are recorded. Per-vertex attributes are refined via minimization of a robust per-pixel color loss [Barron 2019]:
with an additional global clear-color for pixels not mapped to mesh triangles.
Scaling to scenes with hundreds of millions of vertices is achieved by parameterizing per-vertex attributes in an Instant-NGP hash grid (18 levels, entries, ). Training employs Adam for 150,000 iterations with weight decay 0.1 and a learning rate schedule . The optimized attributes are quantized to 8 bits per component and exported as gzipped glTF 2.0 (~434 MB).
5. Performance Metrics and Comparative Analysis
Evaluation on mip-NeRF 360 test scenes demonstrates the following photometric metrics (higher is better for PSNR/SSIM, lower is better for LPIPS), showing BakedSDF outperforms prior real-time view synthesis approaches in both indoor and outdoor scenes:
| Method | Real-Time PSNR/SSIM/LPIPS (Outdoor) | Real-Time (Indoor) |
|---|---|---|
| Deep Blending | 21.54 / 0.524 / 0.364 | – |
| MobileNeRF | 21.95 / 0.470 / 0.470 | – |
| Ours (BakedSDF) | 22.47 / 0.585 / 0.349 | 27.06 / 0.836 / 0.258 |
Rendering speed, power, and storage efficiency are summarized as:
| Method | Power (W) | 1080p FPS | FPS/W | Disk Size (MB) |
|---|---|---|---|---|
| Instant-NGP | 350 | 3.78 | 0.011 | 107 |
| MobileNeRF | 85 | 50.06 | 0.589 | 342 |
| Ours | 85 | 72.21 | 0.850 | 434 |
At an 85 W power budget, BakedSDF yields 1.44× higher FPS/Watt than MobileNeRF, and 77× higher than Instant-NGP, with higher photometric image quality on all metrics.
6. Real-Time Rasterization and Practical Implementation
BakedSDF outputs a mesh+SG glTF asset compatible with standard WebGL- or Vulkan-based renderers. At runtime, the view-dependent color is computed in a simple fragment shader using the SG evaluation, without custom raymarching or neural inference. The scene is bounded by a convex hull of 32 inflated planes and a far sphere (radius 500 units) to mitigate numerical instabilities.
Vertex-order optimization is employed to maximize utilization of on-GPU vertex caches. All SG parameters are stored using 8-bit quantization per component, with quantization supervised during training via straight-through estimators.
The system is deployed fully in-browser (JAX or pure WebGL shaders) on laptops or mobile GPUs. The output watertight mesh facilitates appearance editing (e.g., diffuse vs. specular separation, material recoloring, shadow simulation), physics and collision (rigid-body simulation, synthetic object insertion), and texture- or lightmap baking.
BakedSDF establishes a method for bridging the fidelity of neural radiance fields with the real-time performance and interoperability of polygonal graphics, by "baking" a learned neural SDF into a rasterizable mesh augmented with compact SG-based appearance representations (Yariv et al., 2023).