Voxel Grid Patchification
- Voxel grid patchification is a method that decomposes complex 3D geometries into local voxel feature grids to facilitate precise neural implicit representations.
- It employs trilinear interpolation and a shared MLP to compute local signed distance fields, ensuring accurate reconstruction of intricate geometries.
- Adaptive merging using octree-based structures and constructive solid geometry operations preserves sharp features while reducing computational demand and memory usage.
Voxel grid patchification is a technique for decomposing complex 3D shapes into structured local feature grids (“patch volumes”), each supporting accurate neural implicit representations, and then merging them adaptively using hierarchical data structures. The approach enables efficient, high-fidelity modeling of curved, sharp-featured, or boundary-rich geometries by learning local signed distance fields (SDFs) and applying constructive solid geometry (CSG) operations in a spatially localized and computationally tractable manner. The Patch-Grid method, as introduced in "Patch-Grid: An Efficient and Feature-Preserving Neural Implicit Surface Representation" (Lin et al., 2023), formalizes this approach and addresses key limitations of monolithic MLP-based models in terms of sharp feature preservation, efficiency, and memory footprint.
1. Patch Feature Volume Construction
The pipeline begins with a boundary representation (B-Rep) surface, partitioned into surface patches . For each patch , a mapping to a regular voxel grid (“patch volume”) is constructed. Each is defined as a union of non-empty voxels over the surface patch, subdivided into dimensions with resolution . Every voxel grid corner holds a -dimensional learnable feature vector .
For a query point , its position is converted to local patch coordinates and neighboring voxel indices are computed. The patch feature at , , is obtained via trilinear interpolation over the grid corners:
with weights based on linear fractions of the fractional position within the grid cell.
A small, shared multilayer perceptron (typically 3 hidden layers) is applied to yield the per-patch signed distance field , producing a neural implicit fit of the local geometry within the patch. The zero-level set approximates the patch surface within .
Supervision is provided by a combination of geometric and regularization losses, with sampled points both on the surface and in a local neighborhood. The complete loss functional per patch includes terms for level set fidelity, normal agreement, pseudo-SDF values, Eikonal regularization, off-surface penalties, and feature decay, with specific weighting coefficients (e.g., , ) set as in (Lin et al., 2023).
2. Localized Merging with Octree-Based Merge Grid
To accurately and efficiently handle intersections, edges, and corners between patches, a hierarchical merge structure is constructed:
- An adjacency graph is created with each patch as a node; an edge connects two patches sharing a sharp geometric edge (either convex or concave). Maximally connected subgraphs (cliques) correspond to regions where three or more patches meet at a corner.
- The 3D domain is recursively subdivided by constructing an octree: starting from the global bounding box , cells are split unless all patch subgraphs in the cell are cliques. If so, subdivision stops and the cell is marked as a leaf. The depth parameter controls granularity (typical ).
- At query or mesh extraction time, points are efficiently mapped to their enclosing octree leaf cells, each associated with the small subset of patches present in that region.
This spatial data structure localizes patch interaction, such that patch merging is performed only where interaction occurs, preventing contamination of sharp features and preserving geometric complexity.
3. Constructive Solid Geometry Merging
Within each leaf cell of the octree, the relevant local SDFs for patches in are merged according to CSG logic:
- Union (concave join): .
- Intersection (convex join): .
To enable differentiability, soft-min and soft-max alternatives are used:
A per-cell merge loss is evaluated, where is the locally merged SDF and the sampled on-surface points, contributing to the global training objective.
4. End-to-End Patchification and Inference Workflow
The stepwise workflow is as follows:
- Preprocessing/Patchification: For each , compute a minimally enclosing bounding box , select grid resolution based on local feature size, build the patch voxel grid by subdividing and pruning, and initialize the feature grids.
- Octree Construction: Recursively subdivide the 3D domain using the aforementioned connectivity logic until the octree leaves localize the patch interactions.
- Training: For a number of iterations, perform:
- Sampling of on- and off-surface points per patch.
- Backpropagation of per-patch losses and per-leaf merge losses to update feature codes and MLP weights.
The total loss is aggregated as the average per-patch loss plus a weighted average of the per-leaf merge losses.
- Inference/Mesh Extraction:
- For any spatial query , identify the relevant patch volumes .
- Compute for each.
- Use the octree to find the corresponding merge cell and compute the final SDF value.
- Extract the mesh from the global SDF using Marching Cubes at isovalue 0.
5. Computational Complexity, Memory, and Benchmarks
The voxel grid patchification strategy in Patch-Grid significantly reduces computation compared to global grid or monolithic CSG merging approaches:
| Approach | Training Time (s) | Memory Usage | Merge Complexity |
|---|---|---|---|
| Patch-Grid | ≈ 8 | ~640 MB (5–10M × 32 × 4B) | |
| NH-Rep (monolithic CSG) | ≈ 185 | ~2.6 GB (128³ × 32 × 4B) | |
| NGLOD (global octree + MLP) | ≈ 2296 | N/A |
- In Patch-Grid, per-point evaluation cost is , where is typically $2$–$4$ and the hierarchical lookup in the octree is .
- Global dense grids would require feature codes, while patchified grids sum up to a much lower aggregate voxel count.
- Patch-Grid achieves 20–300× speedup over alternatives in empirical benchmarks on RTX-4090 hardware, with comparable or superior surface reconstruction fidelity for CAD and other sharp-featured domains (Lin et al., 2023).
6. Implications for Neural Implicit Geometry Modeling
Voxel grid patchification enables neural implicit representations to faithfully reconstruct sharp and structured 3D geometry, open boundaries, and thin structures, which present significant challenges for conventional monolithic MLP-based models. By “patchifying” the domain and only merging local SDFs in regions of geometric interaction, the approach localizes both parameterization and computation, yielding high accuracy and orders of magnitude improvement in speed and memory demand.
A plausible implication is that such strategies are highly extensible to other domain decomposition tasks in graphics, CAD, and neural field modeling, particularly where feature preservation and computational resource constraints are critical. The architectural decoupling of encoding (local feature grids) and decoding (shared MLP) facilitates both flexibility and scalability, suggesting applicability to larger, more complex scenes and multimodal 3D data.
7. Summary
Voxel grid patchification, as instantiated in Patch-Grid, decomposes complex shapes into per-patch feature volumes that are efficiently trained with a shared MLP and later merged using adaptive, CSG-based operations within a sparse hierarchical structure. This enables ultra-fast training, precise geometric reconstruction, and effective handling of features and boundaries. The method achieves substantial gains over global voxel grid or monolithic CSG paradigms in both efficiency and feature fidelity, offering a robust foundation for further advances in neural implicit 3D shape representations (Lin et al., 2023).